dedup
There are 25 repositories under dedup topic.
Zygo/bees
Best-Effort Extent-Same, a btrfs dedupe agent
markusressel/py-image-dedup
CLI utility to find near duplicate images and remove all but the best copy.
adlibre/adlibre-backup
High performance rsync backup utilising BTRFS / ZFS filesystem features
chucheng92/HadoopDedup
:watermelon:基于Hadoop和HBase的大规模海量数据去重
lkarlslund/stringdedup
String deduplication package for Go
ParaGroup/p3arsec
Parallel Patterns Implementation of PARSEC Benchmark Applications
veqryn/slog-dedup
Golang structured logging (slog) deduplication and sorting for use with json logging
EastTower16/LLMDataDistill
distill large scale web page text
glehmann/hld
Hard Link Deduplicator
helloall1900/vhash
A C++ reimplementation of Near Duplicate Video Detection - Get a 64-bit comparable hash-value for any video (Video Hash).
JumperBot/whitespace-sifter
Sift duplicate whitespaces away!
uicoolcn/UiCoolVisualWebSpider
📄【优爱酷可视化网站网页数据采集系统】 采用先进的可视化采集技术,智能识别网页元素类型,如:图片、文字、链接、HTML 、文件等,支持运行Javascript脚本、应用正则表达式、自动滚屏、自动翻页、打开弹出窗口并采集数据,支持数据自动去重、仿人工间歇暂停防IP阻塞、自动保存等采集设置;支持浏览器Cookie和缓存等浏览器设置;支持代理轮换科学上网采集;支持“类别/关键字”;支持图像重命名等; 更可支持多线程采集等高级采集选项设置,vip版还可支持定时计划采集。
xyb/chunkdup
Find (partial content) duplicate files.
eminence/deduprs
Hardlink deduplication tool for Linux
go-utils/dedupe
Easy Deduplication
hekmon/deduper
Analyse 2 paths to found identical files and hard link them to save space
rongrimes/zipfile-dedup
Project to take two similar zipfiles, and to dedupe files that have the same tiemstamp in the older file.
carlinhosfranco/BenSP-Suite
BenSP is a suite of parameterizable benchmarks for stream parallelism which is used to evaluate stream processing characteristics.
harshasrisri/dedup
Remove local files that are duplicates of files in another path
horgh/dupefile
Detect and optionally delete duplicate files in a directory tree
jamjamjon/ilytix
A CLI tool for images analysis: checking image integrity, images deduplication, image retrieval.
prebuilder/rdfind.py
A python wrapper to rdfind
xyb/chunksum
Print FastCDC rolling hash chunks and checksums.
dim-geo/analyze-dedup
python script to analyze dedup usage in btrfs
yugn/yadupe
Yet another tool to find and remove duplicate files.