jdx/npm-register

Timeouts and shasum errors with fs storage

deployable opened this issue · 4 comments

There seems to be some weirdness when koa-timeouts occur.

The error seems to work well for /package requests which occasionally go wrong and don't get a response for a long time. A 408 error is thrown, npm fails to parse the non JSON response and tries again. All good.

When a tarball takes a while to load, the koa-timeout will timeout the client request and send a 408 but then the npm request continues to run in the background until it finishes writing the file (most of the time)

  Error: Request timeout
      at Timeout._onTimeout (/srv/npm-register/node_modules/koa-timeout/index.js:11:19)
      at tryOnTimeout (timers.js:232:11)
      at Timer.listOnTimeout (timers.js:202:5)

When the response completes npm-register tries to respond to the client after the 408 has been sent and produces an error

2016-10-05T06:47:58.795Z - /@angular/core/-/core-2.0.1/c0cd8287bbaf367b350c189ac010aee443f4a3c8.tgz Error: Can't set headers after they are sent.
    at ServerResponse.OutgoingMessage.setHeader (_http_outgoing.js:356:11)
    at Object.set (/Users/matt/docker/npm-register/node_modules/koa/lib/response.js:434:16)
    at Object.proto.(anonymous function) [as set] (/Users/matt/docker/npm-register/node_modules/delegates/index.js:40:31)
    at Object.<anonymous> (/Users/matt/docker/npm-register/routes/tarballs.js:28:8)
    at next (native)
    at Object.allowedMethods (/Users/matt/docker/npm-register/node_modules/koa-router/lib/router.js:374:12)
    at next (native)
    at Object.allowedMethods (/Users/matt/docker/npm-register/node_modules/koa-router/lib/router.js:374:12)
    at next (native)
    at Object.allowedMethods (/Users/matt/docker/npm-register/node_modules/koa-router/lib/router.js:374:12)
    at next (native)
    at Object.allowedMethods (/Users/matt/docker/npm-register/node_modules/koa-router/lib/router.js:374:12)
    at next (native)
    at Object.allowedMethods (/Users/matt/docker/npm-register/node_modules/koa-router/lib/router.js:374:12)
    at next (native)
    at Object.dispatch (/Users/matt/docker/npm-register/node_modules/koa-router/lib/router.js:336:14)

What I'm seeing then is an occasional error after the request has been timed out and retried by the npm client. The tarball response that is saved to the file system appears to be cut off and fails the shasum check when delivered to the client

npm ERR! shasum check failed for /tmp/npm-15-05e9c7df/10.8.8.8_4874/bower/-/bower-1.7.9/b7296c2393e0d75edaa6ca39648132dd255812b0.tgz
npm ERR! Expected: b7296c2393e0d75edaa6ca39648132dd255812b0
npm ERR! Actual:   712ae1e8088b628c9bccfbb0d31dac0c89926245
npm ERR! From:     http://10.8.8.8:4874/bower/-/bower-1.7.9/b7296c2393e0d75edaa6ca39648132dd255812b0.tgz

The file needs to be removed from the file system to fix the issue. It happens with various packages.

My initial thought was that the file was being written to by two requests at once (the original and the rerty), so I added some logic to write the file to a tmp file with a random extension, then move the file into place after the download is complete. I can still produce the shasum errors with the extra logic though.

Does this ever crop up with the S3 storage?

The corrupt files have consistently had their ends truncated so far.

± |logging_and_files {1} ✓| →tar -tvf tmp/tarballs/typescript/typescript-2.0.3/33dec9eae86b8eee327dd419ca050c853cabd514.tgz
-rw-rw-rw-  0 0      0        2985 22 Sep 15:34 package/package.json
-rw-rw-rw-  0 0      0         178 22 Sep 15:08 package/.npmignore
-rw-rw-rw-  0 0      0        4311 22 Sep 15:34 package/README.md
-rw-rw-rw-  0 0      0        3486 22 Sep 15:08 package/ThirdPartyNoticeText.txt
-rw-rw-rw-  0 0      0        2443 22 Sep 15:08 package/AUTHORS.md
-rw-rw-rw-  0 0      0        9008 22 Sep 15:34 package/CONTRIBUTING.md
-rw-rw-rw-  0 0      0         834 22 Sep 15:08 package/CopyrightNotice.txt
-rw-rw-rw-  0 0      0       41273 22 Sep 15:34 package/Gulpfile.ts
-rw-rw-rw-  0 0      0        9197 22 Sep 15:08 package/LICENSE.txt
-rw-rw-rw-  0 0      0        8538 22 Sep 15:08 package/.mailmap
-rw-rw-rw-  0 0      0          45 22 Sep 15:08 package/bin/tsc
-rw-rw-rw-  0 0      0          50 22 Sep 15:08 package/bin/tsserver
-rw-rw-rw-  0 0      0         275 22 Sep 15:08 package/lib/README.md
-rw-rw-rw-  0 0      0     3542841 22 Sep 15:34 package/lib/typescript.js
-rw-rw-rw-  0 0      0     2765717 22 Sep 15:34 package/lib/tsserverlibrary.js
-rw-rw-rw-  0 0      0     2774735 22 Sep 15:34 package/lib/tsserver.js
-rw-rw-rw-  0 0      0     2025376 22 Sep 15:34 package/lib/tsc.js
-rw-rw-rw-  0 0      0     3542841 22 Sep 15:34 package/lib/typescriptServices.js
-rw-rw-rw-  0 0      0        1692 22 Sep 15:08 package/lib/lib.es2017.sharedmemory.d.ts
-rw-rw-rw-  0 0      0        1370 22 Sep 15:08 package/lib/lib.es2015.d.ts
-rw-rw-rw-  0 0      0       15136 22 Sep 15:34 package/lib/lib.es2015.iterable.d.ts
-rw-rw-rw-  0 0      0        9159 22 Sep 15:08 package/lib/lib.es2015.promise.d.ts
-rw-rw-rw-  0 0      0        1920 22 Sep 15:34 package/lib/lib.es2015.proxy.d.ts
-rw-rw-rw-  0 0      0        1952 22 Sep 15:08 package/lib/lib.es2015.reflect.d.ts
-rw-rw-rw-  0 0      0        1910 22 Sep 15:08 package/lib/lib.es2015.symbol.d.ts
-rw-rw-rw-  0 0      0       12268 22 Sep 15:08 package/lib/lib.es2015.symbol.wellknown.d.ts
-rw-rw-rw-  0 0      0        4645 22 Sep 15:08 package/lib/lib.es2016.array.include.d.ts
-rw-rw-rw-  0 0      0      852204 22 Sep 15:34 package/lib/lib.d.ts
-rw-rw-rw-  0 0      0        1018 22 Sep 15:08 package/lib/lib.es2017.d.ts
-rw-rw-rw-  0 0      0        1584 22 Sep 15:08 package/lib/lib.es2017.object.d.ts
-rw-rw-rw-  0 0      0        1296 22 Sep 15:08 package/lib/lib.es2015.generator.d.ts
-rw-rw-rw-  0 0      0      197351 22 Sep 15:34 package/lib/lib.es5.d.ts
-rw-rw-rw-  0 0      0      911987 22 Sep 15:34 package/lib/lib.es6.d.ts
-rw-rw-rw-  0 0      0        9360 22 Sep 15:08 package/lib/lib.scripthost.d.ts
-rw-rw-rw-  0 0      0       41129 22 Sep 15:34 package/lib/lib.webworker.d.ts
-rw-rw-rw-  0 0      0       21276 22 Sep 15:34 package/lib/lib.es2015.core.d.ts
-rw-rw-rw-  0 0      0        2425 22 Sep 15:34 package/lib/lib.es2015.collection.d.ts
-rw-rw-rw-  0 0      0      344693 22 Sep 15:34 package/lib/tsserverlibrary.d.ts
-rw-rw-rw-  0 0      0        1181 22 Sep 15:08 package/lib/lib.dom.iterable.d.ts
-rw-rw-rw-  0 0      0      100233 22 Sep 15:34 package/lib/typescript.d.ts
-rw-rw-rw-  0 0      0      647044 22 Sep 15:34 package/lib/lib.dom.d.ts
-rw-rw-rw-  0 0      0      100219 22 Sep 15:34 package/lib/typescriptServices.d.ts
-rw-rw-rw-  0 0      0         970 22 Sep 15:08 package/lib/lib.es2016.d.ts
-rw-rw-rw-  0 0      0         103 22 Sep 15:51 package/test.config

± |logging_and_files {1} ✓| →tar -tvf tmp/tarballs/typescript/typescript-2.0.3/33dec9eae86b8eee327dd419ca050c853cabd514.tgz.trunc
-rw-rw-rw-  0 0      0        2985 22 Sep 15:34 package/package.json
-rw-rw-rw-  0 0      0         178 22 Sep 15:08 package/.npmignore
-rw-rw-rw-  0 0      0        4311 22 Sep 15:34 package/README.md
-rw-rw-rw-  0 0      0        3486 22 Sep 15:08 package/ThirdPartyNoticeText.txt
-rw-rw-rw-  0 0      0        2443 22 Sep 15:08 package/AUTHORS.md
-rw-rw-rw-  0 0      0        9008 22 Sep 15:34 package/CONTRIBUTING.md
-rw-rw-rw-  0 0      0         834 22 Sep 15:08 package/CopyrightNotice.txt
-rw-rw-rw-  0 0      0       41273 22 Sep 15:34 package/Gulpfile.ts
-rw-rw-rw-  0 0      0        9197 22 Sep 15:08 package/LICENSE.txt
-rw-rw-rw-  0 0      0        8538 22 Sep 15:08 package/.mailmap
-rw-rw-rw-  0 0      0          45 22 Sep 15:08 package/bin/tsc
-rw-rw-rw-  0 0      0          50 22 Sep 15:08 package/bin/tsserver
-rw-rw-rw-  0 0      0         275 22 Sep 15:08 package/lib/README.md
-rw-rw-rw-  0 0      0     3542841 22 Sep 15:34 package/lib/typescript.js
-rw-rw-rw-  0 0      0     2765717 22 Sep 15:34 package/lib/tsserverlibrary.js
-rw-rw-rw-  0 0      0     2774735 22 Sep 15:34 package/lib/tsserver.js
-rw-rw-rw-  0 0      0     2025376 22 Sep 15:34 package/lib/tsc.js
tar: Truncated input file (needed 2025472 bytes, only 0 available)
tar: Error exit delayed from previous errors.
jdx commented

I put together a first pass at fixing this. Let me know what you think.

Cool thanks, although same problem testing this! Interestingly it's occurring more frequently now. Something odd is going on here, let me check a few things...

So it looks like the cause of the truncations was external. There's a router this data was going via that also manages vpn connections that was performing particularly badly today. I dropped all the vpn connections and then the npm-register timeouts that also truncate appear to have stopped happening. The router is possibly dumping the occasional tcp connection.

For the app side of it, I can put together a PR from my testing that does a shasum check on the stream and will only move the downloaded file into place if it matches the expected shasum. That should cover any type of tcp or file weirdness that could possibly happen and not leave the registry in a state with a bad/incomplete file. Does that sound useful?

The file deletion on timeout seems to make things slower, at least in my slightly contrived testing examples. If there happens to be something big or slow that regularly hits the timeout then the file never appears and the client has to retry a lot. At least without the delete, the timed out client connection ends up with a downloaded a file, eventually.

I also needed to add a bunch of debug logging to look at this as I couldn't reproduce it in a debugger. I replaced a lot of the console output with a winston logger that supports timestamps and log levels. Even using something simple like debug would be useful. If you're interested in something like that let me know and I'll pull it out into a PR too.