Done a bit benchmarking 2021-12-04
neunmalelf opened this issue ยท 5 comments
local directory with 930 jpeg files with 2 duplicate jpeg files
ubuntu 20.04 no GUI :
rdfind real: 0m00.657s user: 0m00.007s sys: 0m00.114s
ddh real: 0m00.007s user: 0m00.002s sys: 0m00.004s
fdupes real: 0m32.932s user: 0m00.074s sys: 0m05.685s
Windows 11 pro:
ddh total: 0m00.011s (started from a .cmd file)
powershell total: 0m00.012s (incl. grouping & sorting)
DupeGuru coulnd't the duplicate files
Overall i would say nice job ๐๐.
PS: Ssome of the rdfind options would be nice (shameless I know ๐)
Thank you much for the benchmark.
Which features do you want out of rdfind? I've not used it and I'm not promising to implement them but I am curious :)
rdfind features that are usefull for my work
-minsize N (N=1) ignores files with size less than N bytes (also solves the lack of the -ignoreempty option IMHO)
-makesymlinks true |(false) replace duplicate files with symbolic links (but with the options not the first)
-makehardlinks true |(false) replace duplicate files with hard links (but with the options only the first)
^^ makes it easier to keep a "finger" on the first occurrence and "mark" the rest for moving / backup/ deletion in a separate step (with more filtering etc. in a script or so)
-deleteduplicates true |(false) delete duplicate files
Not sure about:
-followsymlinks true |(false) follow symlinks <- needs some more testing on my side
This is also an interesting Option (which i haven't really used so far, but I can imagine its use on systems that runs eg io streams on bare metal etc.)
-sleep Xms sleep for X milliseconds between file reads. Default is 0. Only a few values are supported;
0,1-5,10,25,50,100
My general "Wishlist" for X-Max is short: "a bundle of Linux tools or a version of busybox(on steroids) written in a different language than c/c++ and don't use libc etc." the idea behind ist you come to systems with nothing but the bare minimum grab the source from git or a thumbdrive; install rust (or xyz-compiler) (build your tools) and if needed clean up after your job.
A man can dream right? ๐
Sorry for the delay on getting back to you; the min file size should be fairly straight forward to implement, but any of the functions that change the filesystem are outside the scope of this project. As far as symlinks; ddh should not follow symlinks at all right now. I may need to double check that, but there's no loop detection in place right now and the only safe option with no loop detection is to not follow.
Fair enough. Still very usable program. The rest can be done within the shell script anyway, so there is no hurry.
@neunmalelf I just made a new branch with min file size support. Would you mind giving it a quick check to see that it works as expected? One other change is along for the ride which is position independence of the directories and you'll need to use a -d
flag if you're not already.
Branch is update-to-clap3
https://github.com/darakian/ddh/tree/update-to-clap3