/rdedupe

A Rust based deduplication tool

Primary LanguageRustCreative Commons Zero v1.0 UniversalCC0-1.0

Tests Build binary release Clippy Rustfmt

RDedupe

A Rust based deduplication tool

Goals

  • Build a multiplatform, fast deduplication tool that uses Rust parallelization.

hpc-threaded-data-engineering

Current Status

Future Improvements

  • Add a GUI
  • Add a web interface
  • Fix GitHub Actions Build process to not fail silently!
  • Use Polars DataFrame and include statistics about files and generate a CSV report.
  • Store logs about actions performed across multiple runs

Building and Running

  • Build: cd into rdedupe and run make all
  • Run: cargo run -- dedupe --path tests --pattern .txt
  • Run tests: make test

OS X Install

  • Install rust via rustup
  • Add to ~/.cargo/config
[target.x86_64-apple-darwin]
rustflags = [
  "-C", "link-arg=-undefined",
  "-C", "link-arg=dynamic_lookup",
]

[target.aarch64-apple-darwin]
rustflags = [
  "-C", "link-arg=-undefined",
  "-C", "link-arg=dynamic_lookup",
]
  • run make all in rdedupe directory