npm/pacote

[BUG] FetcherBase._tarxOptions removes files with identical inodes

Opened this issue · 1 comments

rekado commented

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

_tarxOptions in FetcherBase specifies an extraction filter that removes any tar entry that has a type matching Link. node-tar marks files that have more than one hardlink on them as being of type Link. This makes the behavior of tarballStream differ dependent on whether one of the source files happens to have the same inode as another source file. Only one copy of the hardlinked files will thus end up in the target directory.

This is problematic for systems like Guix System where identical files may be deduplicated with hardlinks.

Expected Behavior

The effective output of the tarballStream should be the same independent of whether the involved files share inodes.

Steps To Reproduce

mkdir pkgA
cat<<EOF>>pkgA/package.json
{
  "name": "pkgA",
  "version": "0.0.0",
  "description": "",
  "dependencies": {
    "pkgB": "../pkgB"
  },
  "author": "",
  "license": ""
}
EOF
mkdir pkgB
cat<<EOF>>pkgB/package.json
{
  "name": "pkgB",
  "version": "0.0.0",
  "description": "",
  "author": "",
  "license": ""
}
EOF
touch pkgB/index.js
mkdir pkgB/dist

# duplicate a file via hardlink
ln pkgB/index.js pkgB/dist/index.js 

This is what this looks like:

$ tree
.
├── pkgA
│   └── package.json
└── pkgB
    ├── dist
    │   └── index.js
    ├── index.js
    └── package.json

4 directories, 4 files

Now install pkgA and observe that index.js only appears once.

cd pkgA
npm install --offline --install-links=true

We only see dist/index.js, not its hardlinked alter ego:

$ tree node_modules
node_modules/
└── pkgB
    ├── dist
    │   └── index.js
    └── package.json

Environment

  • npm: 9.5.1
  • Node: v18.16.0
  • OS: Guix System
  • platform: x86_64
rekado commented

If you think that this is rather a bug report for node-tar, please do say so. Perhaps it should not label hardlinked files as Link.