darakian/ddh

Done a bit benchmarking 2021-12-04

neunmalelf opened this issue ยท 5 comments

local directory with 930 jpeg files with 2 duplicate jpeg files

ubuntu 20.04 no GUI :

rdfind   real: 0m00.657s   user: 0m00.007s  sys: 0m00.114s
ddh      real: 0m00.007s   user: 0m00.002s  sys: 0m00.004s
fdupes   real: 0m32.932s   user: 0m00.074s  sys: 0m05.685s

Windows 11 pro:

ddh        total: 0m00.011s (started from a .cmd file) 
powershell total: 0m00.012s (incl. grouping & sorting)
DupeGuru coulnd't the duplicate files

Overall i would say nice job ๐Ÿ‘๐Ÿ‘.

PS: Ssome of the rdfind options would be nice (shameless I know ๐Ÿ˜‡)

Thank you much for the benchmark.

Which features do you want out of rdfind? I've not used it and I'm not promising to implement them but I am curious :)

rdfind features that are usefull for my work

-minsize N (N=1) ignores files with size less than N bytes (also solves the lack of the -ignoreempty option IMHO)
-makesymlinks true |(false) replace duplicate files with symbolic links (but with the options not the first)
-makehardlinks true |(false) replace duplicate files with hard links (but with the options only the first)
^^ makes it easier to keep a "finger" on the first occurrence and "mark" the rest for moving / backup/ deletion in a separate step (with more filtering etc. in a script or so)
-deleteduplicates true |(false) delete duplicate files

Not sure about:
-followsymlinks true |(false) follow symlinks <- needs some more testing on my side

This is also an interesting Option (which i haven't really used so far, but I can imagine its use on systems that runs eg io streams on bare metal etc.)
-sleep Xms sleep for X milliseconds between file reads. Default is 0. Only a few values are supported;
0,1-5,10,25,50,100

My general "Wishlist" for X-Max is short: "a bundle of Linux tools or a version of busybox(on steroids) written in a different language than c/c++ and don't use libc etc." the idea behind ist you come to systems with nothing but the bare minimum grab the source from git or a thumbdrive; install rust (or xyz-compiler) (build your tools) and if needed clean up after your job.
A man can dream right? ๐Ÿ˜‰

Sorry for the delay on getting back to you; the min file size should be fairly straight forward to implement, but any of the functions that change the filesystem are outside the scope of this project. As far as symlinks; ddh should not follow symlinks at all right now. I may need to double check that, but there's no loop detection in place right now and the only safe option with no loop detection is to not follow.

Fair enough. Still very usable program. The rest can be done within the shell script anyway, so there is no hurry.

@neunmalelf I just made a new branch with min file size support. Would you mind giving it a quick check to see that it works as expected? One other change is along for the ride which is position independence of the directories and you'll need to use a -d flag if you're not already.

Branch is update-to-clap3
https://github.com/darakian/ddh/tree/update-to-clap3