Python httplib, urllib, and character encoding, oh my!

Howdy folks,

Schoen here, representing the hard-core software branch of the company, here to rant about a recent trial of mine.  I feel this deserves a post because, frankly, I read a few articles/posts which almost-but-not-quite got me all the way there, which I will cite below.  As a disclaimer, and this may be showing too much of my hand, but this is pretty much the first Python I’ve written.  If I’ve committed any faux pas, please correct me in my ways.  If you need to send binary data of HTTP with Python, here’s how (after the jump):

The Why:

As some of you know, we make lots of game tools here at Defective, including a pretty bitchin level editor called Platformer.  Now platformer is awesome, and totally cloud-based (like… actually cloud, not transferring saves with a USB cable) and we want to make sure that all of our tools are totally awesome and cloud-based.  Our other main product, AssetCloud, acts as the server for this whole experience, tracking and storing all of the game assets for Platformer, and our other projects.  So part of the Platformer suite is a tool we affectionately called PIMP, or the Platformer-to-Maya-Pipeline (yeah, whatever, close enough).  This allows users to take a level created with Platformer within Unity and bring it, fully constructed, into Maya to manually manipulate the level geometry, to give the level a nice artistic touch.  Our main-riggin-man Danger is responsible for PIMP functionality, but he’s a bit of a software noob, so I had to go figure out how to do HTTP transactions from within Maya, via Python.  First of all, this stuff is pretty cool, uploading/downloading from a server directly into Maya.  Sweet shit.

A few days later, I sat down to actually implement the web calls, and prepare a demo script for Danger illustrating how to log into AC, pull some data, upload a Maya file, and download the file back.  Which brings us to:

The How:

So a cursory search proved very positive, I found httplib right off the bat.  HTTP library built right into Python, wham, bam, thank you ma’am, right?  Not quite.  httplib uses a rather clunky scheme of setting up a connection object, and making calls to it for requests/responses.  Apparently (on closer reading of the documentation),

It is normally not used directly — the module urllib uses it to handle URLs that use HTTP and HTTPS

https://docs.python.org/3/library/http.client.html

After a little more research, I found the urllib2 library, which seems to be the de-facto standard.  I was very quickly able to put together the POST requests for communication with the server, and it looked like it would be a short night.  However, once I tried sending a file things got a bit more complicated.

The WTF???:

The first issue came up when using urandom as part of the POST boundary string.  If you’re not familiar with POST headers, do a google search on “post multipart/form-data.”  Serendipitously enough, the first google result at time of writing is the Python example.  I ended up just using a non-random boundary string, but still, when I tried to send binary (non-text) files,  I kept getting

UnicodeDecodeError: 'ascii' codec can't decode byte blah blah in position blah blah...

First I thought, “OK, well I clearly just need to convert the variable to a byte[] or something”, but found that types are a bit more subtle in Python.  After reading the Python manual with regards to encoding, I tried using the unicode() constructor on the string components of the header, but with no luck.  Then I came across binascii.  “Aha!” I thought.  “This is it! I’ve found it! My troubles are over! I can get drunk and play starcra–”  Nope. binascii also didn’t work.  Honestly in hindsight I’m not sure where my confusion came from (the manual entry even mentions the solution), but I think the problem was that I was trying to find information specifically about files and urrlib2, when today I come upon such helpful pages like these.  In any case, I did a little more searching, and thought I had finally found the answer with urllib2_file, an extension which seemed to have been created just for me!

But, alas, the last commit is more than a year old, and the extension isn’t tested past Python 2.5. Plus, after a bit of research, I found that installing extensions to Maya’s built-in Python environment is a bit of a pain, and something we couldn’t expect every user to do.  At this point it was around 1:30AM (I know because of my Chrome history 🙂 ) and I turned to IRC for help.  After a few of the requisite “RTFM” and “you’re doing it wrong” responses, I finally found a helpful chap who took the time to read my code, and identify a few weak spots.  One thing he pointed out in particular, which I think is very cool, is the python for…else statement, of which I was yet unaware.

Finally after enough trial and error, I came upon the solution, and successfully transferred a gif image to and from the server.  Actually, the IRC guys were pretty shocked, and one even replied “but… that shouldn’t work!”  Anyway, whether or not it should be, the solution is as follows.  The really important line is #12:  convertedData = postData.encode(‘UTF-8’). This is what magically turns the data into something which can be concatenated with a string. Hooray!

file = open(filename, "rb", 0)

boundary = "------------ThIs_Is_tHe_bouNdaRY_$"

formdataTemplate = "\r\n--" + boundary + "\r\nContent-Disposition: form-data; name=\"%s\";\r\n\r\n%s";
postData = ''
for key,value in data:
postData += formdataTemplate % (key, value)
formdataTemplate = "\r\n--" + boundary + "\r\nContent-Disposition: form-data; name=\"%s\"; filename=\"%s\"\r\nContent-Type: file\r\n\r\n";
postData += formdataTemplate % ('meshes', filename)
convertedData = postData.encode('UTF-8')
convertedData += file.read()
convertedData += '\r\n--' + boundary + '--\r\n'
request = urllib2.Request(server + '/controllers/create.php', data=convertedData)
request.add_header('Content-Type', "multipart/form-data; boundary=" + boundary)
meshData = json.loads(urllib2.urlopen(request).read())

The… *sigh*:

While I thought my work was over, it turns out that there’s still a bit of a hitch.  Regular POSTS seem to be intermittently coming in with a few random characters toward the beginning and a 0 on the end.  Viewing the page response in a browser doesn’t show the extra data, so I’m convinced that there’s more encoding trouble in the works.  Off I go…

~ by Schoen on June 10, 2011.

3 Responses to “Python httplib, urllib, and character encoding, oh my!”

  1. I’m running into similar issues as you – to get a prototype working I was able to make a system call to curl in order to send the contents of a file via post. Probably not ideal but at least with this method you don’t have to mess around with headers, character encodings, etc. You’ll probably have to download curl if you’re not working in a linux environment.


    import subprocess

    cmd_list = []
    cmd_list.append("curl")
    cmd_list.append("--form")
    cmd_list.append("file=@" + file_name)
    cmd_list.append("--url")
    cmd_list.append(url)
    subprocess.check_call(cmd_list)

    http://curl.haxx.se/docs/manpage.html
    http://docs.python.org/library/subprocess.html

  2. That’s a good call, and something I considered. Unfortunately, we’re targeting primarily Windows and OS X, so it wasn’t a preferable solution.

    It turns out that the system did work, and that the “extraneous data” was just the u’ character that python adds to printout letting us know that it’s unicode data.

    I’m not the main dev on this project, but AFAIK it’s working perfectly 🙂

  3. This script can cause issues when uploading binary files. I wrote a blog post about how I was able to get multipart binary file uploads working in python:

    http://blog.thesparktree.com/post/114053773684/the-unfortunately-long-story-dealing-with

Leave a Reply

Your email address will not be published. Required fields are marked *