whiteinge/ok.sh

upload_asset pipes HTTP headers into jq

ArturKlauser opened this issue · 9 comments

When running upload_asset, jq reports an error:

$ ./ok.sh upload_asset https://uploads.github.com/repos/ArturKlauser/raspberrypi-rstudio/releases/23039592/assets?name=foo1 ok.sh mime_type=application/octet-stream
parse error: Invalid numeric literal at line 1, column 9
jq parse error; invalid JSON.

The error goes away when -q is specified.
Some digging shows that apparently the HTTP headers are piped into jq before the actual json data:

$ printf ':\ncat -\n' > myjq; chmod 755 myjq
$ OK_SH_JQ_BIN=./myjq ~/Downloads/ok.sh upload_asset https://uploads.github.com/repos/ArturKlauser/raspberrypi-rstudio/releases/23039592/assets?name=foo2 ok.sh mime_type=application/octet-stream
HTTP/1.1 201 Created
Cache-Control: no-cache
Content-Security-Policy: default-src 'none'
Content-Type: application/json; charset=utf-8
Etag: W/"75c6982651342ea7db8fa9949fc7b3b4"
Last-Modified: Wed, 22 Jan 2020 14:59:26 GMT
Strict-Transport-Security: max-age=31557600
Vary: Accept, Authorization, Cookie, X-GitHub-OTP
X-Accepted-Oauth-Scopes: repo
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-Github-Media-Type: github.v3; format=json
X-Oauth-Scopes: repo
X-Xss-Protection: 1; mode=block
Date: Wed, 22 Jan 2020 14:59:26 GMT
Transfer-Encoding: chunked
X-GitHub-Request-Id: CE13:5182:68E29:8648C:5E28634C

{"url":"https://api.github.com/repos/ArturKlauser/raspberrypi-rstudio/releases/assets/17506513","id":17506513,"node_id":"MDEyOlJlbGVhc2VBc3NldDE3NTA2NTEz","name":"foo2","label":"","uploader":{"login":"ArturKlauser","id":4790123,"node_id":"MDQ6VXNlcjQ3OTAxMjM=","avatar_url":"https://avatars0.githubusercontent.com/u/4790123?v=4","gravatar_id":"","url":"https://api.github.com/users/ArturKlauser","html_url":"https://github.com/ArturKlauser","followers_url":"https://api.github.com/users/ArturKlauser/followers","following_url":"https://api.github.com/users/ArturKlauser/following{/other_user}","gists_url":"https://api.github.com/users/ArturKlauser/gists{/gist_id}","starred_url":"https://api.github.com/users/ArturKlauser/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/ArturKlauser/subscriptions","organizations_url":"https://api.github.com/users/ArturKlauser/orgs","repos_url":"https://api.github.com/users/ArturKlauser/repos","events_url":"https://api.github.com/users/ArturKlauser/events{/privacy}","received_events_url":"https://api.github.com/users/ArturKlauser/received_events","type":"User","site_admin":false},"content_type":"application/octet-stream","state":"uploaded","size":60861,"download_count":0,"created_at":"2020-01-22T14:59:25Z","updated_at":"2020-01-22T14:59:26Z","browser_download_url":"https://github.com/ArturKlauser/raspberrypi-rstudio/releases/download/untagged-dc669712e7a6ef40f473/foo2"}

Include logs

  • Run ok.sh -x [...command here...] 2> ok-shdebug.log and attach that file.

Environment (please complete the following information)

  • ok.sh release
    • 0.5.1, same happens on HEAD
  • curl version
    • curl 7.64.1 (x86_64-apple-darwin19.0) libcurl/7.64.1 (SecureTransport) LibreSSL/2.8.3 zlib/1.2.11 nghttp2/1.39.2, Release-Date: 2019-03-27
  • jq version.
    • jq-1.6
  • What operating system and version.
    • MacOS Catalina 10.15.2
  • What shell.
    • GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)

ok-shdebug.log

flaix commented

Same problem, different command.

I have the same problem when running create_release with a body= beyond a certain length.

  • ok.sh Version: 0.5.1
  • curl 7.54.0 (x86_64-apple-darwin18.0) libcurl/7.54.0 LibreSSL/2.6.5 zlib/1.2.11 nghttp2/1.24.1
  • jq-1.6
  • MacOS Mojave

I was unable to find out where the wrong input to jq slips through.
But I observed that it is related to the response code. From a certain body size on, curl will sent a Expect: 100-continue header and the response code will be 100, not 200. That is when the error happens.

You can check this e.g. with generating a release with a body of increasing length. I did that with the following script, and with 13 lines, it works fine, with 14, jq did puke.

awk -v lineno=14 -v linelen=80 'BEGIN { i = 0 } { i++; words[i] = $0 } END { line1 = ""; line2 = ""; srand(); n = 0; w = ""; nw = 0; for (j = 0; j < lineno; j++) { while (n <= linelen) { w = words[int(rand() * i) + 1]; nw = length(w); line1 = line2; if (n > 0) { line2 = line2 " " w; n += 1 + nw } else { line2 = w; n += nw } }; print line1; line2 = w; n = nw } }' /usr/share/dict/words

ok-sh-vvv.log

flaix commented

Oooh, Artur I didn't pay attention at first to what you did there with your myjq. Very smart, love it.

Tried the same thing which leads me to an explanation.

The normal case with a short enough body. Response cut for brevity.

$ curl -nsSig -H 'Accept: application/vnd.github.v3+json' -H 'Content-Type: application/json' --data-binary @body10 -X POST https://api.github.com/repos/fzs/gitblit/releases
HTTP/1.1 201 Created
Date: Wed, 22 Jan 2020 20:06:24 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 2573
Server: GitHub.com
Status: 201 Created
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4981
X-RateLimit-Reset: 1579723794
...
Vary: Accept-Encoding
X-GitHub-Request-Id: 5DCD:3BE42:1D79C8F:238EEA4:5E28AB3F

{
  "url": "https://api.github.com/repos/fzs/gitblit/releases/23051499",
  "assets_url": "https://api.github.com/repos/fzs/gitblit/releases/23051499/assets",
...

Now a call to the API with a longer body. Response again cut for brevity.

$ curl -nsSig -H 'Accept: application/vnd.github.v3+json' -H 'Content-Type: application/json' --data-binary @body15 -X POST https://api.github.com/repos/fzs/gitblit/releases
HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Date: Wed, 22 Jan 2020 20:05:02 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 2820
Server: GitHub.com
{
Status: 201 Created
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4982
X-RateLimit-Reset: 1579723794
...
Vary: Accept-Encoding
X-GitHub-Request-Id: 5DDB:3EC99:2680B66:2E8538C:5E28AAED

{
  "url": "https://api.github.com/repos/fzs/gitblit/releases/23051438",
  "assets_url": "https://api.github.com/repos/fzs/gitblit/releases/23051438/assets",
...

In the second case there is the HTTP/1.1 100 Continue response with a blank line, which is what ok.sh triggers on, and treats the rest a the response body.

$ printf ':\ncat -\n' > myjq; chmod 755 myjq
$ OK_SH_JQ_BIN=./myjq .github/ok.sh  create_release fzs gitblit tag_name=v123 draft=true body="$(awk -f randtext -v lineno=15 -v linelen=80 /usr/share/dict/words)"
HTTP/1.1 201 Created
Date: Wed, 22 Jan 2020 19:55:58 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 2791
Server: GitHub.com
{
Status: 201 Created
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4983
X-RateLimit-Reset: 1579723794
...
Vary: Accept-Encoding
X-GitHub-Request-Id: 5C9B:2BB14:24CCFD8:2C6B471:5E28A8CD

{
  "url": "https://api.github.com/repos/fzs/gitblit/releases/23051178",
  "assets_url": "https://api.github.com/repos/fzs/gitblit/releases/23051178/assets",
...

ok.sh will need a special case handling for the 100 Continue response.
(And for the weird stray { in the headers.)

How interesting. I have never encountered 100 Continue before now. That awk script reproduces it fantastically. Thank you both for the excellent investigation.

Suggestions very welcome if you see a potential fix, otherwise I'll plan to poke at this over weekend. Offhand, it looks like _response will require a bit of rework.

I initially got wrapped up in reading the HTTP spec, but there's also a bunch of (grumpy) conversations on the web about this behavior.

Clearly both curl and GitHub are doing the right thing by sending and responding to that header, however it's not buying ok.sh much. It seems we can short-circuit that behavior in curl by including the -H 'Expect:' flag.

I still want to research if there's a more robust or canonical way we should be parsing an HTTP response but avoiding might be a nice, quick fix.

flaix commented

Certainly an option. But I could also see benefit in that behaviour. For example, you upload a large asset but got the authentication wrong. I have no experience and have not tried it out, but it would be better if that lead to an error response immediately, instead of after having sent the 100MB binary.

Is it much more than checking in _response for the response code 100 and discard everything up to the next empty line, after which the actual response will follow? As ok.sh is not interested in the 100 , which it could consider part of the "request".

flaix commented

Just as a proof of concept I toyed around with _response reading over the "100 Continue" and skipping to the actual response. It's not beautiful, only covers this one case, but proves the solution.
Feel free to base something better and cleaner on it, if you like to.
efe5c10

It might actually work. I just tried the curl line for uploading a 30MB asset and also only get a single "100 Continue" line for that size.

flaix commented

Yeah, if I understand it correctly, there should be only one, as it is a stop gap between the client sending headers and then the body.
As this client is only dealing with Github, it could be enough.

Sent with GitHawk

I skimmed through the HTTP parser sources for Node and Golang and both special-case the 100 status code and avoid trying to process the request body so it looks to me like your fix is the best possible fix. Thanks again for the stellar work here.