/fsdd

deduplicate data via hardlinks, analyze drive status, prepare & optimize - filesystems (app/lib/api)

Primary LanguageGoBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

OVERVIEW

Go Reference Go Report Card Go Build

paepcke.de/fsdd

fsdd - [f]ile [s]ystem [d]e[d]uplication

  • deduplicate files at (logical) filesystem layer (inode based) (simple, fast, cache friendly)
  • supports unix-like-inode-based filesystems (no support for microsoft windows yet))
  • replace [slow|space|time|intensive] symlinks via [fast|cheap] hardlinks
  • perfect for all types of [compressed|read-only|rootfs|cache] filesystems
  • reset all [expensive|noisy|leaking|bad-to-compress] metadata eg, birth-time, last-mod time, last-access time
  • analyze exsting filesystem states, backtrack hardlinks, symlinks, savings
  • does NOT cross filesystem boundaries, does not follow any [fs-external] symlinks
  • yes, its fast (uses the new maphash runtime package)
  • 100% pure go, stdlib only, dependency free, use as app or api (see api.go)

INSTALL

go install paepcke.de/fsdd/cmd/fsdd@latest

DOWNLOAD (prebuild)

github.com/paepckehh/fsdd/releases

SHOWTIME

Tame your excessive go mod cache! Even on zfs/btrfs, you will love your new, fast and small go module cache!

cd $GOMODCACHE && fsdd --hard-link . 
FSDD [start] [/usr/store/go]  [hash:MAPHASH] 
FSDD [_done] [time: 39.000221ms]
FSDD [stats] [files:13329] [inode(s): 8680] [sym.valid: 0] [sym.invalid: 0] [data blocks: 277.1 Mbytes]
FSDD [_info] [new deduplication savings] [inode(s): 4649] [data blocks: 66.9 Mbytes]

Same with detailed file listing.

fsdd --verbose . 
[...]

Replace all duplicates.

fsdd --hard-link . 
[...]

More?

fsdd --help 
syntax: fsdd <start-directory> [options]

 --hard-link [-L]
		replace all duplicated files (within local fs boundary) via hardlinks,
		save diskspace and inodes meta data handling, loose duplicate files
		individual metadata [not reversible]

 --sym-link [-S]
		replace all duplicated files (within local fs boundary) via symlinks,
		save diskspace, keep duplicated individual inodes and metadata

 --clean-symlinks [-C]
		replace all valid resolvable symlinks (local fs boundary) via hardlinks [-R],
		and remove all broken symlinks [-X], save inodes and metadata handling,
 		loose symlinks metadata [not reversible]

 --clean-metadata [-M]
		reset all file and directory timestamps for file create and last-modify date (atime,mtime)
		to UnixTime 0 (01.01.1970 00:00) and save meta diskspace and handling overhead

 --replace-symlinks [-R]
		replace all valid symlinks via hardlinks, keep broken symlinks
		save meta diskspace and handling

 --remove-broken-symlinks [-X]
		delete all symlinks where the target does not resolve,

 --secure-hash [-H]
		use SHA512/256 as cryptographic secure hash
		to avoid intentional designed abusive hash collisions

 --fast-hash [-F]
		extreme fast file content hashing [via MAPHASH instead of SHA512/256]
		WARNING: fast-hash-deduplication is not 100% intentional [preimage|abuse|collision] resistant!

WARNING

Hardlinks are absolute great for building fast, small and snappy read-only filesystems and most types of filesytem backed (automatically) managed caches. But on normal manually managed full read and write filesystems hardlinks could result in unexpected data changes or data loss.

DOCS

pkg.go.dev/paepcke.de/fsdd

CONTRIBUTION

Yes, Please! PRs Welcome!