{"id":14,"date":"2011-06-10T19:20:09","date_gmt":"2011-06-10T23:20:09","guid":{"rendered":"http:\/\/www.defectivestudios.com\/devblog\/?p=14"},"modified":"2023-09-27T10:27:19","modified_gmt":"2023-09-27T14:27:19","slug":"python-httplib-urllib-and-character-encoding-oh-my","status":"publish","type":"post","link":"https:\/\/www.defectivestudios.com\/devblog\/python-httplib-urllib-and-character-encoding-oh-my\/","title":{"rendered":"Python httplib, urllib, and character encoding, oh my!"},"content":{"rendered":"\n<p>Howdy folks,<\/p>\n\n\n\n<p>Schoen here, representing the hard-core software branch of the company, here to rant about a recent trial of mine. &nbsp;I feel this deserves a post because, frankly, I read a few articles\/posts which almost-but-not-quite got me all the way there, which I will cite below. &nbsp;As a disclaimer, and this may be showing too much of my hand, but this is pretty much the first Python I&#8217;ve written. &nbsp;If I&#8217;ve committed any faux pas, please correct me in my ways. &nbsp;If you need to send binary data of HTTP with Python, here&#8217;s how (after the jump):<\/p>\n\n\n\n<!--more-->\n\n\n\n<h1 class=\"wp-block-heading\">The Why:<\/h1>\n\n\n\n<p>As some of you know, we make lots of game tools here at Defective, including a pretty bitchin level editor called Platformer. &nbsp;Now platformer is awesome, and totally cloud-based (like&#8230; actually cloud, not transferring saves with a USB cable) and we want to make sure that all of our tools are totally awesome and cloud-based. &nbsp;Our other main product, AssetCloud, acts as the server for this whole experience, tracking and storing all of the game assets for Platformer, and our other projects. &nbsp;So part of the Platformer suite is a tool we affectionately called PIMP, or the Platformer-to-Maya-Pipeline (yeah, whatever, close enough). &nbsp;This allows users to take a level created with Platformer within Unity and bring it, fully constructed, into Maya to manually manipulate the level geometry, to give the level a nice artistic touch. &nbsp;Our main-riggin-man Danger is responsible for PIMP functionality, but he&#8217;s a bit of a software noob, so I had to go figure out how to do HTTP transactions from within Maya, via Python. &nbsp;First of all, this stuff is pretty cool, uploading\/downloading from a server directly into Maya. &nbsp;Sweet shit.<\/p>\n\n\n\n<p>A few days later, I sat down to actually implement the web calls, and prepare a demo script for Danger illustrating how to log into AC, pull some data, upload a Maya file, and download the file back. &nbsp;Which brings us to:<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">The How:<\/h1>\n\n\n\n<p>So a cursory search proved very positive, I found <a href=\"http:\/\/docs.python.org\/library\/httplib.html\">httplib<\/a> right off the bat. &nbsp;HTTP library built right into Python, wham, bam, thank you ma&#8217;am, right? &nbsp;Not quite. &nbsp;httplib uses a rather clunky scheme of setting up a connection object, and making calls to it for requests\/responses. &nbsp;Apparently (on closer reading of the documentation),<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>It is normally not used directly \u2014 the module&nbsp;<a title=\"Open an arbitrary network resource by URL (requires sockets).\" href=\"http:\/\/docs.python.org\/library\/urllib.html#module-urllib\"><tt>urllib<\/tt><\/a> uses it to handle URLs that use HTTP and HTTPS<\/p>\n<cite>https:\/\/docs.python.org\/3\/library\/http.client.html<\/cite><\/blockquote>\n\n\n\n<p>After a little more research, I found the <a href=\"http:\/\/docs.python.org\/library\/urllib2.html\">urllib2<\/a> library, which seems to be the de-facto standard. &nbsp;I was very quickly able to put together the POST requests for communication with the server, and it looked like it would be a short night. &nbsp;However, once I tried sending a file things got a bit more complicated.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">The WTF???:<\/h1>\n\n\n\n<p>The first issue came up when using urandom as part of the POST boundary string. &nbsp;If you&#8217;re not familiar with POST headers, do a google search on &#8220;post multipart\/form-data.&#8221; &nbsp;Serendipitously enough, the first google result at time of writing is the <a href=\"http:\/\/code.activestate.com\/recipes\/146306-http-client-to-post-using-multipartform-data\/\">Python example<\/a>. &nbsp;I ended up just using a non-random boundary string, but still, when I tried to send binary (non-text) files, &nbsp;I kept getting<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>UnicodeDecodeError: 'ascii' codec can't decode byte blah blah in position blah blah...<\/code><\/pre>\n\n\n\n<p>First I thought, &#8220;OK, well I clearly just need to convert the variable to a byte[] or something&#8221;, but found that types are a bit more subtle in Python. &nbsp;After reading the <a href=\"http:\/\/docs.python.org\/howto\/unicode.html\">Python manual<\/a> with regards to encoding, I tried using the unicode() constructor on the string components of the header, but with no luck. &nbsp;Then I came across <a href=\"http:\/\/docs.python.org\/library\/binascii.html\">binascii<\/a>. &nbsp;&#8220;Aha!&#8221; I thought. &nbsp;&#8220;This is it! I&#8217;ve found it! My troubles are over! I can get drunk and play starcra&#8211;&#8221; &nbsp;Nope. binascii also didn&#8217;t work.&nbsp;&nbsp;Honestly in hindsight I&#8217;m not sure where my confusion came from (the manual entry even mentions the solution), but I think the problem was that I was trying to find information specifically about files and urrlib2, when today I come upon such helpful <a href=\"http:\/\/effbot.org\/zone\/unicode-objects.htm\">pages<\/a> <a href=\"http:\/\/www.evanjones.ca\/python-utf8.html\">like<\/a> <a href=\"http:\/\/stackoverflow.com\/questions\/447107\/whats-the-difference-between-encode-decode-python-2-x\">these<\/a>. &nbsp;In any case, I did a little more searching, and thought I had finally found the answer with <a href=\"https:\/\/github.com\/seisen\/urllib2_file\">urllib2_file<\/a>, an extension which seemed to have been created just for me!<\/p>\n\n\n\n<p>But, alas, the last commit is more than a year old, and the extension isn&#8217;t tested past Python 2.5. Plus, after a bit of research, I found that installing extensions to Maya&#8217;s built-in Python environment is a bit of a pain, and something we couldn&#8217;t expect every user to do. &nbsp;At this point it was around 1:30AM (I know because of my Chrome history \ud83d\ude42 ) and I turned to IRC for help. &nbsp;After a few of the requisite &#8220;RTFM&#8221; and &#8220;you&#8217;re doing it wrong&#8221; responses, I finally found a helpful chap who took the time to read my code, and identify a few weak spots. &nbsp;One thing he pointed out in particular, which I think is <strong>very<\/strong> cool, is the python <a href=\"http:\/\/docs.python.org\/reference\/compound_stmts.html#for\">for&#8230;else<\/a> statement, of which I was yet unaware.<\/p>\n\n\n\n<p>Finally after enough trial and error, I came upon the solution, and successfully transferred a gif image to and from the server. &nbsp;Actually, the IRC guys were pretty shocked, and one even replied &#8220;but&#8230; that shouldn&#8217;t work!&#8221; &nbsp;Anyway, whether or not it <em>should<\/em> be, the solution is as follows. &nbsp;The really important line is #12: &nbsp;<span style=\"font-family: Consolas, Monaco, 'Courier New', Courier, monospace;\">convertedData = postData.encode(&#8216;UTF-8&#8217;).<\/span>  This is what magically turns the data into something which can be concatenated with a string.  Hooray!<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfile = open(filename, &quot;rb&quot;, 0)\n\nboundary = &quot;------------ThIs_Is_tHe_bouNdaRY_$&quot;\n\nformdataTemplate = &quot;\\r\\n--&quot; + boundary + &quot;\\r\\nContent-Disposition: form-data; name=\\&quot;%s\\&quot;;\\r\\n\\r\\n%s&quot;;\npostData = &#039;&#039;\nfor key,value in data:\npostData += formdataTemplate % (key, value)\nformdataTemplate = &quot;\\r\\n--&quot; + boundary + &quot;\\r\\nContent-Disposition: form-data; name=\\&quot;%s\\&quot;; filename=\\&quot;%s\\&quot;\\r\\nContent-Type: file\\r\\n\\r\\n&quot;;\npostData += formdataTemplate % (&#039;meshes&#039;, filename)\nconvertedData = postData.encode(&#039;UTF-8&#039;)\nconvertedData += file.read()\nconvertedData += &#039;\\r\\n--&#039; + boundary + &#039;--\\r\\n&#039;\nrequest = urllib2.Request(server + &#039;\/controllers\/create.php&#039;, data=convertedData)\nrequest.add_header(&#039;Content-Type&#039;, &quot;multipart\/form-data; boundary=&quot; + boundary)\nmeshData = json.loads(urllib2.urlopen(request).read())\n<\/pre><\/div>\n\n\n<h1 class=\"wp-block-heading\">The&#8230; *sigh*:<\/h1>\n\n\n\n<p>While I thought my work was over, it turns out that there&#8217;s still a bit of a hitch. &nbsp;Regular POSTS seem to be intermittently coming in with a few random characters toward the beginning and a 0 on the end. &nbsp;Viewing the page response in a browser doesn&#8217;t show the extra data, so I&#8217;m convinced that there&#8217;s more encoding trouble in the works. &nbsp;Off I go&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Howdy folks, Schoen here, representing the hard-core software branch of the company, here to rant about a recent trial of mine. &nbsp;I feel this deserves a post because, frankly, I read a few articles\/posts which almost-but-not-quite got me all the way there, which I will cite below. &nbsp;As a disclaimer, and this may be showing [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,8,5,1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/posts\/14"}],"collection":[{"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/comments?post=14"}],"version-history":[{"count":6,"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/posts\/14\/revisions"}],"predecessor-version":[{"id":582,"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/posts\/14\/revisions\/582"}],"wp:attachment":[{"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/media?parent=14"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/categories?post=14"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.defectivestudios.com\/devblog\/wp-json\/wp\/v2\/tags?post=14"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}