data-deduplication
There are 15 repositories under data-deduplication topic.
dpc/rdedup
Data deduplication engine, supporting optional compression and public key encryption.
sail-sg/sailcraft
🚢 Data Toolkit for Sailor Language Models
jchristn/WatsonDedupe
Self-contained C# library for data deduplication using Sqlite
Zabuzard/FastCDC4J
Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.
david-siqi-liu/sparklyclean
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
bmiller1009/deduper
General deduping engine for JDBC sources with output to JDBC/csv targets
shubham-thakare/data-deduplication
A JAVA project that splits data using hashing techniques and removes duplicate blocks to save cloud storage. This project also uses the CloudSim framework for cloud storage simulation.
bevry/fellow
Fellow is a package for creating people that can be unified by their shared values via a singleton list on the class
gagan3012/PolyDeDupe
PolyDeDupe: Multi-Lingual Data Deduplication
baraverkstad/mixtape
Practical backups. The Unix toolkit way.
imehar/data-deduplication
This is a server client architecture based data deduplication implementation
Jim-JMCD/Data_storage_network_deduplication_calculator
A calculator for storage and transmission of deduplicated data presentation in charts and tables
KeerthanaPalanikumar/Data-Cleaning-on-SQL
This repository contains SQL scripts and documentation for cleaning and standardizing data in the NashvilleHousing table within the sqlproject2 database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.
fabriziosalmi/text-boundaries
A Python-based tool for preprocessing, cleaning, and analyzing text datasets, designed to filter, deduplicate, sort data, and generate statistical insights.