Fails to create correct equivalence classes over hard links
albcorp opened this issue · 1 comments
When using the -makehardlinks
option, rdfind
fails to unify sequences of hard links and copies of a single file.
WHAT I DID
I took the following steps to demonstrate the fault.
- Distribute a single file across a sequence of directories using a
mix ofcp
andln
mkdir A B C D E F G H I J cp ~/Documents/README.rst A ln A/README.rst B/README.rst ln B/README.rst C/README.rst ln C/README.rst D/README.rst cp D/README.rst E/ ln E/README.rst F/README.rst cp F/README.rst G/ ln G/README.rst H/README.rst ln H/README.rst I/README.rst cp I/README.rst J/
- Verify the sequence of inodes of the copies
Receive output of the form:
ls -i [A-J]/README.rst
2720488 A/README.rst 2720488 B/README.rst 2720488 C/README.rst 2720488 D/README.rst 2720624 E/README.rst 2720624 F/README.rst 2720650 G/README.rst 2720650 H/README.rst 2720650 I/README.rst 2720708 J/README.rst
- Run
rdfind
across the sequence of directoriesReceive output of the form:rdfind -dryrun false -makehardlinks true [A-J]
Now scanning "A", found 1 files. Now scanning "B", found 1 files. Now scanning "C", found 1 files. Now scanning "D", found 1 files. Now scanning "E", found 1 files. Now scanning "F", found 1 files. Now scanning "G", found 1 files. Now scanning "H", found 1 files. Now scanning "I", found 1 files. Now scanning "J", found 1 files. Now have 10 files in total. Removed 6 files due to nonunique device and inode. Total size is 76148 bytes or 74 KiB Removed 0 files due to unique sizes from list.4 files left. Now eliminating candidates based on first bytes:removed 0 files from list.4 files left. Now eliminating candidates based on last bytes:removed 0 files from list.4 files left. Now eliminating candidates based on sha1 checksum:removed 0 files from list.4 files left. It seems like you have 4 files that are not unique Totally, 56 KiB can be reduced. Now making results file results.txt Now making hard links. Making 3 links.
- Verify the sequence of inodes after relinking
Receive output of the form:
ls -i [A-J]/README.rst
2720488 A/README.rst 2720488 B/README.rst 2720488 C/README.rst 2720488 D/README.rst 2720488 E/README.rst 2720624 F/README.rst 2720488 G/README.rst 2720650 H/README.rst 2720650 I/README.rst 2720488 J/README.rst
WHAT I EXPECTED
I expected all the filenames to be hardlinked to the same inode, in
which case, the output should have been of the form:
2720488 A/README.rst 2720488 B/README.rst 2720488 C/README.rst
2720488 D/README.rst 2720488 E/README.rst 2720488 F/README.rst
2720488 G/README.rst 2720488 H/README.rst 2720488 I/README.rst
2720488 J/README.rst
I note that the estimate of disk usage reduction supports my
expectation.
I found this bug when attempting to compress sequences of backups made
using rsync
with the --link-dest
option. rdfind
reported very
substantial disk space savings, but achieved none.
VERSIONS
This is rdfind version 1.4.1
On Fedora 36
Never mind, I see this is covered under Caveats on the GitHub repository. I found the explanation confusing. I expected the algorithm to be based on disjoint-set forests