data-matching

There are 35 repositories under data-matching topic.

moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Language:Python1.8k 17 772196
J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
Language:Python1k 32 138155
J535D165/data-matching-software
A list of free data matching and record linkage software.
394 25 1642
RobinL/fuzzymatcher
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
Language:Python286 10 4361
maxharlow/csvmatch
🔎 Finds fuzzy matches between CSV files
Language:Python190 9 3521
vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Language:Jupyter Notebook157 19 616
ropeladder/record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems
125 11 115
Wikidata/soweego
Link Wikidata items to large catalogs
Language:Python96 5 29510
AI-team-UoA/pyJedAI
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Language:Python84 4 1512
Senzing/awesome
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Language:Python62 7 163
J535D165/recordlinkage-annotator
A browser user interface for manual labeling of record pairs.
Language:JavaScript48 2 19
vaneseltine/nominally
A maximum-strength name parser for record linkage.
Language:Python39 3 561
HPI-Information-Systems/snowman
Welcome to Snowman App – a Data Matching Benchmark Platform.
Language:TypeScript38 6 1062
lewinfox/levitate
Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).
Language:R36 2 82
carlosraphael/specification-pattern
https://medium.com/@carlosraphael/specification-design-pattern-in-java-8-bac6f5f943bc
Language:Java30 2 06
abcsys/libem
Compound AI toolchain for fast and accurate entity matching, powered by LLMs.
Language:Python24 0 134
maxharlow/textmatch
🔎 Finds fuzzy matches between datasets
Language:Python15 1 20
wbsg-uni-mannheim/winter
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Language:Java8 0 02
Evnsn/awsome-entity-resolution
A collection of awesome resources regarding Record Linkage.
7 1 00
ihmeuw/person_linkage_case_study
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
Language:HTML3 1 00
rohitgarud/asreview-preprocess
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
Language:Python2 1 60
AvinashSingh786/WekaComparator
Weka Comparator to match rules to test data with filtering abilites
Language:Java1 1 00
kefilweditse/awesome-matchem-datasets
Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.
1
Knodl-LLC/KnoDL-Match
Service for automatic matching two data sets without mapping
Language:Shell1 0 00
pkhaan/AutoCuratedMovieLists
This projects aims to provide lists containing only great movies to users based only a few filters and search parameters.
Language:Dart1 2 01
sevetseh28/data-integration-extensible-framework
Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included
Language:HTML1 3 00
greyhub/job_center
Crawl, matching and explore data about jobs in Viet Nam.
Language:Jupyter Notebook0 2 01
Abhishek-Bansode/identity-reconciliation
Identity Reconciliation is a Java-based project focused on resolving and merging duplicate identities across systems to ensure consistent and accurate user data management.
Language:Java
beatriz-valio/tcc-beatriz.weiss
Código produzido em Trabalho de Conclusão de Curso na Universidade Federal de Santa Catarina, curso de Sistemas de Informação pela aluna Beatriz Valio Weiss
Language:Python
boscoj2008/AdapterEM
AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning
Language:Python2 0
deangrant/dii-operator
A standardized email and phone number normalization and hashing utility that follows UID2 specifications for email address and phone number processing. This tool ensures consistent normalization and hash generation for identity resolution and data matching purposes.
Language:TypeScript
Gust4voSales/proxcluster-deduplicator
ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science
Language:Jupyter Notebook1 0
KNehe/musical
A Single View application aggregates and reconciles data from multiple sources to create a single view of an entity.
Language:Python1 0
lokhande-vishnu/cs838-data-science
Repository for CS 838 (Spring 2017) Data Science project
Language:Jupyter Notebook1 0
Yuki-M0906/comprehensive-data-matcher
高精度な名寄せツール - レーベンシュタイン距離とJaccard係数を組み合わせたハイブリッドスコアリング方式
Language:Python

data-matching

moj-analytical-services/splink

J535D165/recordlinkage

J535D165/data-matching-software

RobinL/fuzzymatcher

maxharlow/csvmatch

vintasoftware/entity-embed

ropeladder/record-linkage-resources

Wikidata/soweego

AI-team-UoA/pyJedAI

Senzing/awesome

J535D165/recordlinkage-annotator

vaneseltine/nominally

HPI-Information-Systems/snowman

lewinfox/levitate

carlosraphael/specification-pattern

abcsys/libem

maxharlow/textmatch

wbsg-uni-mannheim/winter

Evnsn/awsome-entity-resolution

ihmeuw/person_linkage_case_study

rohitgarud/asreview-preprocess

AvinashSingh786/WekaComparator

kefilweditse/awesome-matchem-datasets

Knodl-LLC/KnoDL-Match

pkhaan/AutoCuratedMovieLists

sevetseh28/data-integration-extensible-framework

greyhub/job_center

Abhishek-Bansode/identity-reconciliation

beatriz-valio/tcc-beatriz.weiss

boscoj2008/AdapterEM

deangrant/dii-operator

Gust4voSales/proxcluster-deduplicator

KNehe/musical

lokhande-vishnu/cs838-data-science

Yuki-M0906/comprehensive-data-matcher