postrank-labs/goliath

Parsing IO stream for file upload

kennym opened this issue · 2 comments

I am having trouble parsing the IO stream when asynchronously uploading a file, while using the code from async_upload.rb in the examples directory.

  def on_body(env, data)
    env.logger.info 'received data: ' + data
    (env['async-body'] ||= '') << data
  end
>> ::Rack::Utils::Multipart.parse_multipart(env["async-body"])
nil

My async-body looks as follows:

env["async-body"]

"------WebKitFormBoundarybZBSwB8ypRDnLb9V\r\nContent-Disposition: form-data; name=\"body\"; filename=\"hablar-con-tus-hijos-sobre-problemas-financieros-1.jpg\"\r\nContent-Type: image/jpeg\r\n\r\n\xFF\xD8\xFF\xE0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xFF\xFE\x00;CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 80\n\xFF\xDB\x00C\x00\x06\x04\x05\x06\x05\x04\x06\x06\x05\x06\a\a\x06\b\n\x10\n\n\t\t\n\x14\x0E\x0F\f\x10\x17\x14\x18\x18\x17\x14\x16\x16\x1A\x1D%\x1F\x1A\e#\x1C\x16\x16 , #&')*)\x19\x1F-0-(0%()(\xFF\xDB\x00C\x01\a\a\a\n\b\n\x13\n\n\x13(\x1A\x16\x1A((((((((((((((((((((((((((((((((((((((((((((((((((\xFF\xC0\x00\x11\b\x01\xB7\x02v\x03\x01\"\x00\x02\x11\x01\x03\x11\x01\xFF\xC[ snip ]
\xD2\x8A)\x88i\"\x93#\xD2\x8A(\x10\xD2E\x03\x1E\x94Q@\bqI\x91E\x14\xC0BE!4Q@\x84\xC8\xA6\x93E\x14\x00\xC2j'aE\x14\x01\x1374QE\x00\x7F\xFF\xD9\r\n------WebKitFormBoundarybZBSwB8ypRDnLb9V--\r\n"`

Are there any strategies on this one?

I've used Postman for Chrome as a HTTP client, and curl.

Here's the full response env: http://pastebin.com/a7m7rjYd

So, I have been able to parse out the images by using this:

env["async-body"].split("\r\n")[-2]

Still, I am not sure how that is going to work in production, and I am still open for suggestions.

Well, that's basically right, except you can probably do something smarter by only scanning for boundary on last received chunk and then invoking the split... Also, if you're streaming a large upload, you may want to consider streaming that to a temp file on disk, or something similar.