npm/pacote

[BUG] FetchError: network timeout

abnud1 opened this issue · 3 comments

When I call pacote.packument multiple times each with a different package name in parallel, resulting in multiple requests to the npm registry, if the number of calls was big enough a timeout error will eventually happen, this is expected since issuing a lot of requests is problematic.

The problem is that we see this timeout error even if we issue the requests in batches as in this code example:

const pacote = require('pacote');
const fs = require('fs');
const pMap = require('p-map');
fs.readFile('/home/abd/أحمد/package.json',{encoding: 'utf-8'},(err,data) => {
    const packageJson = JSON.parse(data);
    // you can replace devDependencies with dependencies so long the number of packages
    // is big
    pMap(Object.keys(packageJson.devDependencies),(packageName) => { 
        return pacote.packument(packageName).then((result) => {
            console.log(result);
        });
    },{concurrency: 8});
});

p-map is this package, the above limits the requests concurrency to 8.

If you try the above code with package.json inside this zip, it will throw a timeout error, you can see my analysis on this issue in this comment.

The problem in summary: pacote.packument seems not designed for concurrent calls.

related: raineorshine/npm-check-updates#634

I think this was a result of a legitimate networking timeout issue on the registry that happened around that time. Or, something else is going on.

I couldn't reproduce the problem with the package.json you provided. So, I thought, let's push it further. Like, as far as it'll go. Let's fetch every packument.

const pacote = require('../')
const https = require('https')
const fs = require('fs')
const pMap = require('p-map');

const runTest = ({rows}) => {
  console.error(`Starting test with ${rows.length} packuments`)
  let i = 0
  pMap(rows, ({id}) => pacote.packument(id).then(result => {
    if (++i % 1000 === 0)
      console.error(`fetched ${i} successfully`)
  }).catch(er => {
    if (er.code !== 'E404') {
      // 404 errors for unpublished packages are normal and expected
      throw er
    }
  }), { concurrency: 8 }).then(() => console.log('ok!'))
}

// cache the file since it takes a long time
try {
  const allDocs = JSON.parse(fs.readFileSync(__dirname + '/all_docs.json', 'utf8'))
  console.error('running test with cached all_docs')
  runTest(allDocs)
} catch (_) {
  console.error('did not get cached all_docs', _)
  https.get('https://replicate.npmjs.com/_all_docs', res => {
    const body = []
    res.on('data', c => body.push(c))
    res.on('end', () => {
      const buf = Buffer.concat(body)
      fs.writeFileSync(__dirname + '/all_docs.json', buf)
      runTest(JSON.parse(buf.toString()))
    })
  })
}

This took quite some time to finish, but it did finish successfully, downloading 1264179 packuments with a concurrency of 8.

I even bumped the concurrency up to 10,000, and it still crunched through it all without hitting a network timeout.

🤷‍♂️

@isaacs Did you try on a slow network ? Mine was 1Mbps .

And thanks for the Hard Work

Well, right, if you're on a slow enough network, it'll time out sometimes. The point is, it's not pacote forcing it to time out by keeping connections open for old requests when a lot of requests are made, so as far as I can tell, not a bug in this module.

If you're on a slow network, I recommend setting opts.retry which is passed through to make-fetch-happen. If you set a very high opts.retry.retries and opts.retry.maxTimeout, and set a high opts.timeout option, then you can tune it to the expected time before your network can be reasonably expected to return a result.

Since this is legitimately your network taking longer than your configs are allowing for, an ETIMEOUT is the correct error, and I think pacote and make-fetch-happen are working as designed. (Also note that it'll only hit this after retrying a few times, so transient errors shouldn't be a major problem.)