Aug202010

File uploading with multi part encoding using Twisted

If you program in Python and you are building web clients you need to know about Twisted. It is perhaps the most flexible and powerful Python framework for building web clients and servers, and has been proven to work wonderfully under heavy loads. So powerful, that the most basic tasks tend to be a bit of a pain. This is what I realized when I found myself in the need to upload large files to a web site using Twisted. I googled, and I googled, and I found no real answer to my problems.

You can of course find an easy solution: build a multipart request by building the request body yourself out of the files you are trying to upload. Sure, that works. But what happens when you are dealing with a 50 MB file upload? What about 500 MB? I’m sure you are not planning to encode the whole file as a string, right?

This is where Twisted’s body producers come handy. Implementing your own body producer, you have total control on how to build the request. In fact, Twisted will call you every time it needs data for the request, so you can be sure you won’t be building the whole chunk in one string. Instead, you will be sending chunks of bytes to what is known as the consumer. What is the consumer? Whatever is asking for a request body.

Ok, enough said. I could go on and on with the explanation, but let’s learn by coding (yay!.) We want to upload a file to a server using Twisted. We also want to know how the file upload is doing (by getting progress callbacks), and when it finishes. Just for the fun of it, we will also send other POST data with the request.

First things first, let us build the server code that will handle the upload. For simplicity (well, that’s sort of B.S., since Python is simple enough) we’ll build a PHP script that will take a file upload, and print out the uploaded information, together with any posted data. Here’s the code for that:

<?php
$files = array();
foreach ($_FILES as $field => $file) {
	if (empty($file['tmp_name']) || !is_uploaded_file($file['tmp_name'])) {
		continue;
	}
	$files[$field] = $file;
}

print_r($_POST);
print_r($files);
?>

Easy right? We are just printing out whatever we got, discarding any invalid file upload information.

Let’s look at our client to upload a file, and POST some data to this PHP script. This Python script is using the body producer I built, together with a handy receiver for getting the response. More about that later. Here is the client script:

from twisted.internet import defer
from twisted.internet import reactor
from twisted.web import client
from twisted.web import http_headers

from pyfire.twistedx import producer
from pyfire.twistedx import receiver

def finished(bytes):
	print "Upload DONE: %d" % bytes

def progress(current, total):
	print "Upload PROGRESS: %d out of %d" % (current, total)

def error(error):
	print "Upload ERROR: %s" % error

def responseDone(data):
	print "Response:"
	print "-" * 80
	print data
	reactor.stop()

def responseError(data):
	print "ERROR with the response. So far I've got:"
	print "-" * 80
	print data
	reactor.stop()

url = "http://kramer/upload.php"
files = {
	"upload": "/home/mariano/myfile.tar.gz"
}
data = {
	"field1": "value1"
}

producerDeferred = defer.Deferred()
producerDeferred.addCallback(finished)
producerDeferred.addErrback(error)

receiverDeferred = defer.Deferred()
receiverDeferred.addCallback(responseDone)
receiverDeferred.addErrback(responseError)

myProducer = producer.MultiPartProducer(files, data, progress, producerDeferred)
myReceiver = receiver.StringReceiver(receiverDeferred)

headers = http_headers.Headers()
headers.addRawHeader("Content-Type", "multipart/form-data; boundary=%s" % myProducer.boundary)

agent = client.Agent(reactor)
request = agent.request("POST", url, headers, myProducer)
request.addCallback(lambda response: response.deliverBody(myReceiver))

reactor.run()

Let’s look at what we are doing here. We start by importing some twisted modules we need for this script, and the multipart and receiver modules from the pyfire module I wrote. Don’t worry, you don’t have to use pyfire to use these two modules. Just make sure to download the twistedx module that is part of pyfire. Next, we define some callback functions: finished() to inform that the upload is done, progress() to keep us in the loop while the request is being sent to the server, error() in case sh*t happens, and finally responseDone() and responseError() to handle the response from the server, and stop Twisted’s reactor (its event loop.)

In the actual script code we see that we start by defining the destination URL, and dictionaries with the files to be sent out, and data to post, both of them indexed by field name. We proceed then to start coding the “Twisted” way: creating two deferreds that will be used by the producer, and the receiver. If you don’t know about deferreds you probably haven’t read the Twisted manual much, so I recommend you go over their section on deferreds. Basically it’s a very easy (and flexible, and chainable, and…) way to get callbacks.

So we build two deferreds: the first one (producerDeferred) is for the producer, where we attach a success callback (the finished() function), and an error callback (the error() function). The second one (receiverDeferred) is used by the receiver, and contains a success callback (the responseDone() function), and an error callback (the responseError() function). Both of these functions will print out whatever data we got as response, and finish by stopping the reactor.

We then build the producer, passing on the files to upload, the data to post, the callback progress() function that will be called throughout the upload, and the deferred for the producer. Similarly, we build the receiver, passing only its deferred.

Having both the producer and the receiver, we can now proceed to create the actual request, not without first creating any additional headers we may need (the request will automatically specify the content length out of the body producer.) Since we are uploading a file, we specify the content type of the request to be multipart/form-data, and as its boundary we set whatever the producer chose as boundary for our chunks in the request body.

The final step is the actual running of the request, doing it the typical Twisted way: first getting an agent for the reactor, creating the request (a POST request to the given URL, with the given headers and body producer), and adding a callback for when we get a response. The callback in this case is a simple lambda function, that delivers the body from the response to the receiver. Finally, we run the reactor.

Notice that when you run the reactor the run() call will block until you stop the reactor. This is why from our response callbacks (responseDone() and responseError()) we stop the reactor whenever we get some sort of response.

If you run this script against your PHP server script, you may get an output like the following:

Upload PROGRESS: 153 out of 9120
Upload PROGRESS: 320 out of 9120
Upload PROGRESS: 9080 out of 9120
Upload PROGRESS: 9082 out of 9120
Upload PROGRESS: 9120 out of 9120
Upload DONE: 9120
Response:
--------------------------------------------------------------------------------
Array
(
    [field1] => value1
)
Array
(
    [upload] => Array
        (
            [name] => myfile.tar.gz
            [type] => application/x-tar
            [tmp_name] => /tmp/phplINynd
            [error] => 0
            [size] => 8760
        )
)

Isn’t this fun?



Leave a Comment

4 Comments to "File uploading with multi part encoding using Twisted"

  1. Sep272010 at 10:03 am

    Leo Lima wrote:

    hi man, i am your new follower!! you are good programmer.. i found a perfect code in ‘http://www.php.net/manual/pt_BR/function.get-meta-tags.php#56701′.
    so, i need some help now..
    i would like to do a code that colects the meta datas from a url and returns in a variable. example:
    $title = value inside tags
    $description = value from
    $tags = value from

    just it.. may you help me?
    king regards

    LeoLima

  2. Sep272010 at 1:50 pm

    mariano wrote:

    I’m not sure I follow what you are saying. If you use the function that you linked to, you’ll notice it returns an array with title, and metaTags. The description is actually a meta tag by itself, same with tags. So you can do:

    $result = getUrlData('/');
    if (!empty($result)) {
    extract($result);
    echo 'TITLE: ' . $title . '<br />';
    echo 'META TAGS: '; print_r($metaTags);
    }

  3. Feb222012 at 11:38 pm

    Odie wrote:

    Is it possible to use twistedx with twisted.web.client instead of Agent?

  4. Jun132012 at 12:28 pm

    Rob wrote:

    Nice tutorial. I had fun playing with this even though it was not quite what I was looking for.

    I need to process raw post data

 
Powered by Wordpress and MySQL. Clauz's design for by Cricava