dedup

There are 25 repositories under dedup topic.

  • Zygo/bees

    Best-Effort Extent-Same, a btrfs dedupe agent

    Language:C++7612523858
  • markusressel/py-image-dedup

    CLI utility to find near duplicate images and remove all but the best copy.

    Language:Python16152917
  • adlibre/adlibre-backup

    High performance rsync backup utilising BTRFS / ZFS filesystem features

    Language:Shell361379
  • chucheng92/HadoopDedup

    :watermelon:基于Hadoop和HBase的大规模海量数据去重

    Language:Java299016
  • lkarlslund/stringdedup

    String deduplication package for Go

    Language:Go18310
  • ParaGroup/p3arsec

    Parallel Patterns Implementation of PARSEC Benchmark Applications

    Language:C++17507
  • veqryn/slog-dedup

    Golang structured logging (slog) deduplication and sorting for use with json logging

    Language:Go16210
  • EastTower16/LLMDataDistill

    distill large scale web page text

    Language:C++12111
  • glehmann/hld

    Hard Link Deduplicator

    Language:Rust8311
  • helloall1900/vhash

    A C++ reimplementation of Near Duplicate Video Detection - Get a 64-bit comparable hash-value for any video (Video Hash).

    Language:C++4302
  • JumperBot/whitespace-sifter

    Sift duplicate whitespaces away!

    Language:Rust4200
  • uicoolcn/UiCoolVisualWebSpider

    📄【优爱酷可视化网站网页数据采集系统】 采用先进的可视化采集技术,智能识别网页元素类型,如:图片、文字、链接、HTML 、文件等,支持运行Javascript脚本、应用正则表达式、自动滚屏、自动翻页、打开弹出窗口并采集数据,支持数据自动去重、仿人工间歇暂停防IP阻塞、自动保存等采集设置;支持浏览器Cookie和缓存等浏览器设置;支持代理轮换科学上网采集;支持“类别/关键字”;支持图像重命名等; 更可支持多线程采集等高级采集选项设置,vip版还可支持定时计划采集。

  • xyb/chunkdup

    Find (partial content) duplicate files.

    Language:Python3201
  • eminence/deduprs

    Hardlink deduplication tool for Linux

    Language:Rust230
  • go-utils/dedupe

    Easy Deduplication

    Language:Go2210
  • hekmon/deduper

    Analyse 2 paths to found identical files and hard link them to save space

    Language:Go210
  • rongrimes/zipfile-dedup

    Project to take two similar zipfiles, and to dedupe files that have the same tiemstamp in the older file.

    Language:Python2101
  • carlinhosfranco/BenSP-Suite

    BenSP is a suite of parameterizable benchmarks for stream parallelism which is used to evaluate stream processing characteristics.

    Language:C1101
  • harshasrisri/dedup

    Remove local files that are duplicates of files in another path

    Language:Rust1100
  • horgh/dupefile

    Detect and optionally delete duplicate files in a directory tree

    Language:Go120
  • jamjamjon/ilytix

    A CLI tool for images analysis: checking image integrity, images deduplication, image retrieval.

    Language:Rust1100
  • prebuilder/rdfind.py

    A python wrapper to rdfind

    Language:Python1200
  • xyb/chunksum

    Print FastCDC rolling hash chunks and checksums.

    Language:Python1200
  • dim-geo/analyze-dedup

    python script to analyze dedup usage in btrfs

    Language:Python0100
  • yugn/yadupe

    Yet another tool to find and remove duplicate files.

    Language:Python0101