MagicStack/httptools

`feed_data` is not safe for http header fragmentation

youknowone opened this issue · 11 comments

if any broken data chunk is fed to feed_data, it will be parsed in incomplete form.

@youknowone do you have a test case at hand to reproduce the issue?

A request:

b'''GET /ping/ HTTP/1.1\r\nHost: github.com\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: ko-KR,ko;q=0.8,en-US;q=0.6,en;q=0.4\r\n\r\n'''

I called feed_data each byte by byte and the headers are broken.

Related issue in sanic which uses httptools: sanic-org/sanic#755

They are implmenting their own header fragment buffer.

1st1 commented

I'll take a look as soon as I finish working on the next uvloop release.

1st1 commented

@yohanboniface feel free to work on this if you have time

@youknowone to make sure I understand the issue: it arises when a request is chunked in a middle of a header field?
So for example, to continue on your data, we'd have this as first chunk:

GET /ping/ HTTP/1.1\r\nHost: github.com\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nUpgrade-Insecure-Requests: 1\r\nTransfer-Encoding: chunked\r\nUser-Agent: Mozilla/5.0

And this as second chunk:

(Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: ko-KR,ko;q=0.8,en-US;q=0.6,en;q=0.4\r\n\r\n

And then you'd have an incomplete value for User-Agent and an invalid header name with the rest of the value?

Is that correct or am I missing something?

edit bah, no, indeed a chunked request is only about the body.
So it's not about the request itself being split in the middle of a header field but that the code implementing httptools chunking it manually before calling feed_data?

@yohanboniface Yes, your description is correct. User-Agent will be Mozzle/5.0 in that case.

As I know, the chunked body is a spec about logical chunk, not about TCP packet fragment.

A question here: does httptools expect to feed the whole HTTP body (at least "a chunk") at same time? Then it can be a user fault - but still weird.

Because httptools is the parser, I think basically the users can't determine which part of http request is going to httptools or not. For the point of view of user, "end of chunk" of http body and any fragmented packet in http header is not recognizable before putting it into the parser.
I think httptools is the correct place to merge the fragmented tcp packets to avoid double-parsing http request both in httptools and the users.

@youknowone made a quick unittest to reproduce what I've understood of the issue, but… it passes ;)

See #26

Can you please check the unittest and tell me what I'm missing to properly reproduce the issue? thanks :)

Thanks, your test is really helpful.
It seems I need to look into both httptools and sanic.
Give me some time. I am new to both part.

I changed your test a little and it now starts to be broken: #27

I also added a patch to #27. Thanks @yohanboniface, I would never looked into it without your test.

Cool!
I'll let @1st1 do the final review :)