data-matching
There are 34 repositories under data-matching topic.
moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
J535D165/data-matching-software
A list of free data matching and record linkage software.
RobinL/fuzzymatcher
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
maxharlow/csvmatch
🔎 Finds fuzzy matches between CSV files
vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
ropeladder/record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems
Wikidata/soweego
Link Wikidata items to large catalogs
AI-team-UoA/pyJedAI
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Senzing/awesome
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
J535D165/recordlinkage-annotator
A browser user interface for manual labeling of record pairs.
HPI-Information-Systems/snowman
Welcome to Snowman App – a Data Matching Benchmark Platform.
vaneseltine/nominally
A maximum-strength name parser for record linkage.
lewinfox/levitate
Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).
carlosraphael/specification-pattern
https://medium.com/@carlosraphael/specification-design-pattern-in-java-8-bac6f5f943bc
abcsys/libem
Compound AI toolchain for fast and accurate entity matching, powered by LLMs.
maxharlow/textmatch
🔎 Finds fuzzy matches between datasets
wbsg-uni-mannheim/winter
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Evnsn/awsome-entity-resolution
A collection of awesome resources regarding Record Linkage.
ihmeuw/person_linkage_case_study
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
rohitgarud/asreview-preprocess
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
AvinashSingh786/WekaComparator
Weka Comparator to match rules to test data with filtering abilites
kefilweditse/awesome-matchem-datasets
Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.
Knodl-LLC/KnoDL-Match
Service for automatic matching two data sets without mapping
pkhaan/AutoCuratedMovieLists
This projects aims to provide lists containing only great movies to users based only a few filters and search parameters.
sevetseh28/data-integration-extensible-framework
Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included
greyhub/job_center
Crawl, matching and explore data about jobs in Viet Nam.
Abhishek-Bansode/identity-reconciliation
Identity Reconciliation is a Java-based project focused on resolving and merging duplicate identities across systems to ensure consistent and accurate user data management.
beatriz-valio/tcc-beatriz.weiss
Código produzido em Trabalho de Conclusão de Curso na Universidade Federal de Santa Catarina, curso de Sistemas de Informação pela aluna Beatriz Valio Weiss
boscoj2008/AdapterEM
AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning
deangrant/dii-operator
A standardized email and phone number normalization and hashing utility that follows UID2 specifications for email address and phone number processing. This tool ensures consistent normalization and hash generation for identity resolution and data matching purposes.
Gust4voSales/proxcluster-deduplicator
ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science
KNehe/musical
A Single View application aggregates and reconciles data from multiple sources to create a single view of an entity.
lokhande-vishnu/cs838-data-science
Repository for CS 838 (Spring 2017) Data Science project