npm/pacote

[BUG] .gitignore renamed to .npmignore

cb1kenobi opened this issue · 6 comments

What / Why

When calling pacote.extract() with a package tarball that contains a .gitignore, it is renamed to .npmignore. This is problematic if the package contains something such as a project template where the .gitignore does not apply to npm.

Where

The .gitignore file is being renamed in fetcher.js in the filter() callback defined in _tarxOptions.

One would believe simply adding a .npmignore to a package would prevent the .gitignore from being renamed. However, on my machine, .gitignore is ordered and extracted before .npmignore which means sawIgnores will always be empty and thus the .gitignore is renamed.

Since filter() has no way of knowing the order of the files being extracted, the code will have no choice but to track potential renames, then untrack them when a .npmignore is found:

// PROPOSED FIX
if (/File$/.test(entry.type)) {
	const base = basename(entry.path)
	if (base === '.npmignore') {
		sawIgnores.delete(entry.path.replace(/\.npmignore$/, '.gitignore'))
	} else if (base === '.gitignore') {
		sawIgnores.add(entry.path)
	}
	return true
}

Next, sawIgnores needs to somehow be magically returned so that extractor.on('end', ...) can loop over it and rename the .gitignore files.

for (const ignoreFile of sawIgnores) {
	const from = path.join(dest, ignoreFile);
	const to = path.join(dest, ignoreFile.replace(/\.gitignore$/, '.npmignore'));
	fs.renameSync(from, to);
}

As a bonus, wrap this entire thing in a flag so it can be disabled because publishing a .npmignore makes no sense.

Test Case

I've create a handy dandy package template-kit-test with version v1.0.3 containing a .gitignore and v1.0.4 containing both a .gitignore and a .npmignore.

I understand your use case, and it's valid, but since .gitignore files are relevant in the context of npm package creation, and dropping a .gitignore file into an installed node_modules folder can cause serious problems, we really can't change this too easily. Ultimately, the use cases involved with "publishing and installing packages" take priority over anything else, as far as npm is concerned.

I'd recommend that your template kit uses a filename like template.gitignore, and move it into place when it's unpacked. Or, pack up your template in a tarball or some other kind of bundle, and unpack when it's deployed.

Regardless of my use case, the code is still bugged. .gitignore seems to always be extracted before .npmignore rendering the whole sawIgnores mechanism pointless. Just sayin'.

So... I must be missing something, then.

What's the behavior that seems to be intended, and what's happening instead?

@isaacs inside _tarxOptions(), there's this handy filter() callback. It's called for every entry in the archive.

In all of my testing, .gitignore is always extracted before .npmignore. I assume they are alphabetically sorted.

Here's the code:

if (base === '.npmignore')
  sawIgnores.add(entry.path)
else if (base === '.gitignore') {
  // rename, but only if there's not already a .npmignore
  const ni = entry.path.replace(/\.gitignore$/, '.npmignore')
  if (sawIgnores.has(ni))
    return false
  entry.path = ni
}

If .gitignore is extracted before .npmignore, then sawIgnores.has(ni) will return false and thus .gitignore is renamed to .npmignore.

If .npmignore is extracted before .gitignore, then sawIgnores.has(ni)will return true and will NOT rename.gitignoreto.npmignore`.

If the archive has both a .gitignore and .npmignore, the .gitignore is extracted first and renamed to .npmignore. Then .npmignore is extracted and overwrites the .gitignore version of .npmignore.

There's 2 ways to fix this:

  1. Instead of the code using sawIgnores to track .npmignore instances, it needs to track .gitignore instances. That way when you do extract a .npmignore, you can remove the .gitignore from the Set and whatever .gitignore files are left in the Set after extracting would be renamed to .npmignore.

  2. Assuming a clean extract destination, leave the .gitignore and sawIgnores.has(ni) code as is, but before .npmignore is written, check if the destination exists, and if so, rename the .npmignore back to .gitignore. If the archive is being extracted over an existing directory with a .npmignore, then this won't work as there's no way of knowing if the existing .npmignore was a renamed .gitignore.

Hope that helps.

In all cases, where there is either/both of .npmignore and/or .gitignore, we should end up extracting a .npmignore file, and not a .gitignore. If there's a .gitignore and no .npmignore, it should be renamed. If there's both, then we should extract the .npmignore, and throw away the .gitignore.

  • .npmignore only
    • base is .npmignore, add it to sawIgnores and extract
    • never hit the second if
    • result .npmignore is extracted ✅
  • .gitignore only
    • replace the entry path from .gitignore to .npmignore
    • sawIgnores doesn't contain it, so continue and extract
    • never hit the first if
    • result file extracted as .npmignore
  • .npmignore then .gitignore
    • hit the first if, add to sawIgnores
    • hit the second if, but sawIgnores already has it, so return false to not extract
    • result extracted .npmignore, threw away .gitignore via filter ✅
  • .gitignore then .npmignore
    • hit the second if, rename to .npmignore
    • hasn't been seen, so extract it
    • hit the second if, add to sawIgnores, and extract over the .gitignore
    • result extracted .npmignore as .npmignore, threw away .gitignore via overwrite ✅

It seems like this is working as intended, regardless of file order.

In all of my testing, .gitignore is always extracted before .npmignore. I assume they are alphabetically sorted.

They are processed in this function in the order in which they appear in the archive. For most packages that are created by npm-packlist, yes, this will be alphabetical. But if you create a tarball via some other mechanism, they can be in any arbitrary order.

Thanks for taking a closer look. Cheers!