h4cc/multipart

Streaming parsing?

Opened this issue · 4 comments

It would be really cool if this library supported streaming parsing of a message. One thing I thought of was accepting an fopen resource or Guzzle stream resource as input, and returning an iterator that returns Guzzle stream resources that are just LimitStreams over the original data. Then the usage could become something like this:

$parts = new PartIterator(fopen('http://example.com/huge_multipart_body', 'r'));

foreach ($parts as $part) {
    var_dump($part['headers']);
    while (!$part['content']->eof()) {
        echo $part->read(1024);
    }
}

In order to get an array of parts, you'd need to provide a sort of helper function with the caveat that it loads the entire message into memory (something like this could be on the iterator):

public function toArray()
{
    $parts = [];
    foreach ($this as $part) {
        $parts[] = ['headers' => $part['headers'], 'content' => (string) $parts['content']);
    }

    return $parts;
}
h4cc commented

Thats a interesting idea. As far as i understood it, the aim is to have a iterator or such, for the parts of a multipart content. While each part will contain a completely read array of headers and another iterator or such for reading the body if requested.

This could be implemented for multipart content already in memory, as well as content read from a filehandle. While both might return the same result in the end.

Thanks for that idea and the reference to your good looking stream library. I will have a look at this and try to come up with something :)

h4cc commented

Does this issue have anything to do with psr-7? From what i saw from the interfaces, a FileBag or so is missing. This means the body has to be parsed via a StreamInterface.

Nothing to do with Psr7

On Sep 19, 2014, at 3:08 PM, Julius Beckmann notifications@github.com wrote:

Does this issue have anything to do with psr-7? From what i saw from the interfaces, a FileBag or so is missing. This means the body has to be parsed via a StreamInterface.


Reply to this email directly or view it on GitHub.

@h4cc I have no opinion on the iterator part, but we frequently run into memory issues with multipart responses. It would be very helpful if this library parsed using streams in order to better support large responses.