[BUG] FetchError: network timeout
abnud1 opened this issue · 3 comments
When I call pacote.packument
multiple times each with a different package name in parallel, resulting in multiple requests to the npm registry, if the number of calls was big enough a timeout error will eventually happen, this is expected since issuing a lot of requests is problematic.
The problem is that we see this timeout error even if we issue the requests in batches as in this code example:
const pacote = require('pacote');
const fs = require('fs');
const pMap = require('p-map');
fs.readFile('/home/abd/أحمد/package.json',{encoding: 'utf-8'},(err,data) => {
const packageJson = JSON.parse(data);
// you can replace devDependencies with dependencies so long the number of packages
// is big
pMap(Object.keys(packageJson.devDependencies),(packageName) => {
return pacote.packument(packageName).then((result) => {
console.log(result);
});
},{concurrency: 8});
});
p-map is this package, the above limits the requests concurrency to 8.
If you try the above code with package.json inside this zip, it will throw a timeout error, you can see my analysis on this issue in this comment.
The problem in summary: pacote.packument
seems not designed for concurrent calls.
related: raineorshine/npm-check-updates#634
I think this was a result of a legitimate networking timeout issue on the registry that happened around that time. Or, something else is going on.
I couldn't reproduce the problem with the package.json you provided. So, I thought, let's push it further. Like, as far as it'll go. Let's fetch every packument.
const pacote = require('../')
const https = require('https')
const fs = require('fs')
const pMap = require('p-map');
const runTest = ({rows}) => {
console.error(`Starting test with ${rows.length} packuments`)
let i = 0
pMap(rows, ({id}) => pacote.packument(id).then(result => {
if (++i % 1000 === 0)
console.error(`fetched ${i} successfully`)
}).catch(er => {
if (er.code !== 'E404') {
// 404 errors for unpublished packages are normal and expected
throw er
}
}), { concurrency: 8 }).then(() => console.log('ok!'))
}
// cache the file since it takes a long time
try {
const allDocs = JSON.parse(fs.readFileSync(__dirname + '/all_docs.json', 'utf8'))
console.error('running test with cached all_docs')
runTest(allDocs)
} catch (_) {
console.error('did not get cached all_docs', _)
https.get('https://replicate.npmjs.com/_all_docs', res => {
const body = []
res.on('data', c => body.push(c))
res.on('end', () => {
const buf = Buffer.concat(body)
fs.writeFileSync(__dirname + '/all_docs.json', buf)
runTest(JSON.parse(buf.toString()))
})
})
}
This took quite some time to finish, but it did finish successfully, downloading 1264179 packuments with a concurrency of 8.
I even bumped the concurrency up to 10,000, and it still crunched through it all without hitting a network timeout.
@isaacs Did you try on a slow network ? Mine was 1Mbps .
And thanks for the Hard Work
Well, right, if you're on a slow enough network, it'll time out sometimes. The point is, it's not pacote forcing it to time out by keeping connections open for old requests when a lot of requests are made, so as far as I can tell, not a bug in this module.
If you're on a slow network, I recommend setting opts.retry which is passed through to make-fetch-happen
. If you set a very high opts.retry.retries
and opts.retry.maxTimeout
, and set a high opts.timeout
option, then you can tune it to the expected time before your network can be reasonably expected to return a result.
Since this is legitimately your network taking longer than your configs are allowing for, an ETIMEOUT is the correct error, and I think pacote and make-fetch-happen are working as designed. (Also note that it'll only hit this after retrying a few times, so transient errors shouldn't be a major problem.)