golang/go

net/http/cgi: allow chunked body in request

gopherbot opened this issue · 26 comments

by cgmurray:

What steps will reproduce the problem?
Why are requests with chunked transfer encoding not permitted in http/cgi? Is this
according to the cgi specification? Git sends data (push) with chunked transfer encoding
and I believe that it's supported in Apache.

Side-note: By disabling the error-check in ServeHTTP chunked requests seems work in
general but when sending large data, >= 1MB,  in combination with writing the
"Content-Type" header the request data seems to be corrupted. Any other
header, e.g. "_Content-Type" works however.

Which version are you using?  (run 'go version')
go version go1.1 darwin/amd64

Comment 1:

I think the CGI "specification" predates HTTP/1.1, so it's likely it doesn't say
anything specifically about it.
Can you confirm that Apache or some other popular CGI host supports this? If so, include
the environment variables that a child CGI process under Apache sees.

Status changed to WaitingForReply.

Comment 2:

Labels changed: removed priority-triage.

Comment 3 by cgmurray:

Hi,
Tested with the following cgi-script:
-----------------------------------
#!/bin/sh
echo "Content-Type: application/octet-stream"
echo ""
set
md5
-----------------------------------
md5 /tmp/large_file
MD5 (/tmp/large_file) = 768c49f96473c4aa1d9efc5aaa094c1f
curl  -X POST --data-binary @/tmp/large_file http://localhost:9090/md5/
...
768c49f96473c4aa1d9efc5aaa094c1f
Differences between chunked and regular:
curl -X POST --data-binary @/tmp/large_file http://localhost:9090/md5/ -v > /tmp/a
curl -H "Transfer-Encoding: chunked" -X POST --data-binary @/tmp/large_file
http://localhost:9090/md5/ -v > /tmp/b
diff /tmp/a /tmp/b
8d7
< CONTENT_LENGTH=1024000
21a21
> HTTP_TRANSFER_ENCODING=chunked
37c37
< REMOTE_PORT=59660
---
> REMOTE_PORT=59661
Note that the data is "de-chunked" as the md5-checksum is correct when chunked encoding
is on.
I can create a complete go-example to demonstrate the strange behavior when
"Content-Type" and only "Content-Type" is set as described in first post if you like.

Comment 4 by cgmurray:

Forgot to mention that the web server is Apache/2.2.22

Comment 5 by cgmurray:

Hi,
The comments are out of order as I forget to fill in the captcha for the first one.
Tested with the following cgi-script:
----------------------------------------------
#!/bin/sh
echo "Content-Type: application/octet-stream"
echo ""
set
md5
----------------------------------------------
md5 /tmp/large_file
MD5 (/tmp/large_file) = 768c49f96473c4aa1d9efc5aaa094c1f
curl -H "Transfer-Encoding: chunked" -X POST --data-binary @/tmp/large_file
http://localhost:9090/md5/
...
768c49f96473c4aa1d9efc5aaa094c1f
Differences between chunked and regular:
curl -X POST --data-binary @/tmp/large_file http://localhost:9090/md5/ -v > /tmp/a
curl -H "Transfer-Encoding: chunked" -X POST --data-binary @/tmp/large_file
http://localhost:9090/md5/ -v > /tmp/b
diff /tmp/a /tmp/b
8d7
< CONTENT_LENGTH=1024000
21a21
> HTTP_TRANSFER_ENCODING=chunked
37c37
< REMOTE_PORT=59668
---
> REMOTE_PORT=59669
Note that the data is "de-chunked" as the md5 is correct.
I can create a complete go-example demonstrating the strange behavior when
"Content-Type" and only "Content-Type" is set as described briefly in the first post if
you like.
Thanks!

Comment 6 by cgmurray:

I've tried to post a complete example twice. Perhaps caught in the spam-filter due to
links?

Comment 7 by cgmurray:

Here is the post a tried to send as an attachment instead.

Attachments:

  1. chunked.txt (1050 bytes)

Comment 8 by cgmurray:

Any news on this issue? See also issue #5660 for a potentially related problem.
rsc commented

Comment 9:

Labels changed: added go1.2maybe, priority-later.

Status changed to Accepted.

rsc commented

Comment 10:

Labels changed: added feature.

Comment 11:

Not for 1.2.

Labels changed: removed go1.2maybe.

rsc commented

Comment 12:

Labels changed: added go1.3maybe.

rsc commented

Comment 13:

Labels changed: removed feature.

rsc commented

Comment 14:

Labels changed: added release-none, removed go1.3maybe.

rsc commented

Comment 15:

Labels changed: added repo-main.

The guard conditional against chunked in cgi.go lines 104 to 108 can be removed now that ChunkedReader is used on http.Request. Chunked requests work fine now because of that change.

@commondream, I assume you mean in src/net/http/cgi/host.go, since there is no cgi.go.

How did you test this? What is your configuration?

@bradfitz Oh yes, host.go. I'll describe what was going on in depth and we can take it from there:

I was goofing around with git-http-backend (git ends up using chunked requests) and ran into issues using net/http/cgi, but noticed that in most other servers like Apache the request chunking gets handled transparently when passed to CGI scripts. While troubleshooting that some I noticed that the request body was a chunkedWriter and the docs seemed to say that it would handle chunked requests transparently, so I pulled the host.go file over, ripped that conditional out, and tested it, and it seemed to work fine in this instance.

I took a quick stab at getting a patch over to you guys, but simply ran out of time and figured it'd make more sense to mention it here in case anyone could fix it more quickly than I can. It looks like host_test.go will need some updates as well to cover the fact that chunking would be supported as well.

I would like to see this all broken & fixed & how it appears on the wire myself before fixing it.

I haven't used Apache in ages. What's the minimal Apache config I need to run a CGI? I'd like to throw an HTTP/1.1 chunked request body at Apache and see what it spits out to the CGI process.

I haven't actually run it in Apache, but know how it works because of the docs describing git-http-backend and using it as CGI on Apache.

From what I can tell the reason removing that guard clause works is because NewChunkedReader dechunks the body of the document, and req.ContentLength gets set to -1 (so Stdin gets handed to the CGI cmd).

I could see not fixing this issue (and I can keep my hacked version of CGI in my little project), or I could see letting CGI handle chunked requests without giving a ContentLength (which technically wouldn't fit the CGI spec but is what git-http-backend uses) like a lot of web servers do.

I've tried getting a patch together, but it requires tweaking the test CGI and my perl skills are as rusty as your Apache config skills.

Should have mentioned too - let me know if you're willing to go the "Not quite standard CGI but clearly used by git-http-backend" route of allowing chunked requests and I'll push through and figure out how to get a patch together.

@commondream, I'm happy to look at a patch, but I'd also like to see a demo of something using this in the wild. If you could give me a demo Docker container or something to see git-http-backend running under some other webserver, that'd be great.

You have about a month before the Go 1.8 tree closes.

@bradfitz I think we're cool to just close it at this point - we ended up going with a non-CGI technique, and I'd say while some folks are doing chunked encoding with CGI, it's certainly not the norm. I don't have a sample at this point, because we moved on.

I'd rather not close it. Somebody else will inevitably come along and want this or be surprised that we don't act like other implementations.

But we can demote its milestone. I'd still like somebody to investigate & fix it.

pjanx commented

I'll chime in and say that all CGI/1.1, SCGI and FastCGI need a CONTENT_LENGTH for requests with a content body, equal to the length of the input stream. SCGI further requires it to be present for all requests.

It might be reasonable to buffer the entire de-chunked message in the HTTP server before passing it over to the CGI application, which of course brings the problem of limiting the buffer size, and in general adds more code. RFC 3875 describes this possibility:

The CONTENT_LENGTH value must reflect the length of the message-body after the server has removed any transfer-codings or content-codings.

As transfer-codings are not supported on the request-body, the server MUST remove any such codings from the message-body, and recalculate the CONTENT_LENGTH. If this is not possible (for example, because of large buffering requirements), the server SHOULD reject the client request. It MAY also remove content-codings from the message-body.

That is, remove the chunked Transfer-Encoding at the server and compute a Content-Length.

As a side note about the current code, given that these messages must not contain a Content-Length field, it might be more appropriate to send back 411 Length Required instead of the more generic 400 Bad Request. Or even better 501 Not Implemented, even if the specification doesn't seem to intend this response to be used for the chunked encoding.

if len(req.TransferEncoding) > 0 && req.TransferEncoding[0] == "chunked" {
	rw.WriteHeader(http.StatusBadRequest)
	rw.Write([]byte("Chunked request bodies are not supported by CGI."))
	return
}
korc commented

Maybe not most elegant, but I solved this problem for myself by wrapping cgi.Handler into an UnChunkHandler http.Handler class, which writes chunked data to temp file before passing it to the CGI.