Command line tool to remove duplicated files. It recurses a reference and a root directory, finds file duplicates from the reference directory tree in the root directory tree and removes them.
dupsrm
: duplicates removal
Remove duplicated files in the reference directory that are found in the root directory tree
Usage: dupsrm [OPTIONS] <REFERENCE_DIR> <ROOT_DIR>
Arguments:
<REFERENCE_DIR> Reference directory path
<ROOT_DIR> Root directory path
Options:
-n, --dry-run
Perform a dry-run without removing any file
-r, --regex <REGEX>
Regular expression filtering files in reference directories
-a, --hash-algorithm <HASH_ALGORITHM>
Hash algorithm [default: SHA2-256] [possible values: SHA2-256, SHA3-256, SHA1, MD5, WHIRLPOOL, RIPEMD-160, BLAKE-256]
-h, --help
Print help
-V, --version
Print version
Usecases:
- Removing outdated backups
- Cleaning
Downloads
folder from copied and possibly renamed files - Save disk space
cargo build --release
cargo install --path .
rm default_*
rm dupsrm.profdata
cargo clean
# profile execution
RUSTFLAGS="-C instrument-coverage" cargo build
target/debug/dupsrm test/ .
# or profile tests
RUSTFLAGS="-C instrument-coverage" cargo test --tests
llvm-profdata merge -sparse default_*.profraw -o dupsrm.profdata
llvm-cov report --use-color --ignore-filename-regex='/.cargo/registry' --instr-profile=dupsrm.profdata --object target/debug/dupsrm
llvm-cov show --use-color --ignore-filename-regex='/.cargo/registry' --instr-profile=dupsrm.profdata --object target/debug/dupsrm
- Recursively iterate root and reference directories
- Calculate the hash of each file and store them in a list aside from the path
- Create a list of duplicates in the reference directory
- Add command line interface to define reference and root paths
See the Rust CLI book for further details. Use clap for command line argument parsing - Add the method to remove files
- Add the command line flags
-n, --dry-run
to don't remove files as ingit rm
- Modularize source code into different files
- Add additional unit tests with an example file structure
- Create a docker container for running build tests
- Create Github and Gitlab CI
- Modularize the hash function to allow the usage of other hash algorithms
- Benchmark implementation using cargo-bench
- Parallelize iterators and hashing of files in multiple threads
- Write documentation with usage examples
- Extend logger output
- Use a hashmap to find duplicated hashes decreasing the computational complexity
- Add a filter for file types or regex support
- Use
PathBuf
instead ofString
for paths - Wrap hash type with
&str
or fixed size type - Add a flag to not recurse the reference directory or set a maximum depth
- Provide usage examples with regular expression
- Add an option to create symlinks or hard links to original files, replacing the removed files in the reference directory