data-matching

There are 34 repositories under data-matching topic.

  • moj-analytical-services/splink

    Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

    Language:Python1.7k17763190
  • recordlinkage

    J535D165/recordlinkage

    A powerful and modular toolkit for record linkage and duplicate detection in Python

    Language:Python1k32137156
  • J535D165/data-matching-software

    A list of free data matching and record linkage software.

  • RobinL/fuzzymatcher

    Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

    Language:Python285104360
  • maxharlow/csvmatch

    🔎 Finds fuzzy matches between CSV files

    Language:Python19193522
  • vintasoftware/entity-embed

    PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

    Language:Jupyter Notebook15419615
  • ropeladder/record-linkage-resources

    Resources for tackling record linkage / deduplication / data matching problems

  • Wikidata/soweego

    Link Wikidata items to large catalogs

    Language:Python96629510
  • pyJedAI

    AI-team-UoA/pyJedAI

    An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

    Language:Python8141212
  • Senzing/awesome

    Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.

    Language:Python627162
  • J535D165/recordlinkage-annotator

    A browser user interface for manual labeling of record pairs.

    Language:JavaScript47318
  • snowman

    HPI-Information-Systems/snowman

    Welcome to Snowman App – a Data Matching Benchmark Platform.

    Language:TypeScript3861062
  • vaneseltine/nominally

    A maximum-strength name parser for record linkage.

    Language:Python383561
  • lewinfox/levitate

    Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).

    Language:R36282
  • carlosraphael/specification-pattern

    https://medium.com/@carlosraphael/specification-design-pattern-in-java-8-bac6f5f943bc

    Language:Java30206
  • abcsys/libem

    Compound AI toolchain for fast and accurate entity matching, powered by LLMs.

    Language:Python240134
  • maxharlow/textmatch

    🔎 Finds fuzzy matches between datasets

    Language:Python14120
  • wbsg-uni-mannheim/winter

    WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.

    Language:Java8001
  • Evnsn/awsome-entity-resolution

    A collection of awesome resources regarding Record Linkage.

  • ihmeuw/person_linkage_case_study

    Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).

    Language:HTML3100
  • rohitgarud/asreview-preprocess

    An extension for ASReview Lab to preprocess the dataset before importing in ASReview

    Language:Python2160
  • AvinashSingh786/WekaComparator

    Weka Comparator to match rules to test data with filtering abilites

    Language:Java1100
  • kefilweditse/awesome-matchem-datasets

    Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.

  • Knodl-LLC/KnoDL-Match

    Service for automatic matching two data sets without mapping

    Language:Shell1000
  • pkhaan/AutoCuratedMovieLists

    This projects aims to provide lists containing only great movies to users based only a few filters and search parameters.

    Language:Dart1201
  • sevetseh28/data-integration-extensible-framework

    Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included

    Language:HTML1300
  • greyhub/job_center

    Crawl, matching and explore data about jobs in Viet Nam.

    Language:Jupyter Notebook0201
  • Abhishek-Bansode/identity-reconciliation

    Identity Reconciliation is a Java-based project focused on resolving and merging duplicate identities across systems to ensure consistent and accurate user data management.

    Language:Java
  • beatriz-valio/tcc-beatriz.weiss

    Código produzido em Trabalho de Conclusão de Curso na Universidade Federal de Santa Catarina, curso de Sistemas de Informação pela aluna Beatriz Valio Weiss

    Language:Python
  • boscoj2008/AdapterEM

    AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning

    Language:Python20
  • deangrant/dii-operator

    A standardized email and phone number normalization and hashing utility that follows UID2 specifications for email address and phone number processing. This tool ensures consistent normalization and hash generation for identity resolution and data matching purposes.

    Language:TypeScript
  • Gust4voSales/proxcluster-deduplicator

    ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science

    Language:Jupyter Notebook10
  • KNehe/musical

    A Single View application aggregates and reconciles data from multiple sources to create a single view of an entity.

    Language:Python10
  • lokhande-vishnu/cs838-data-science

    Repository for CS 838 (Spring 2017) Data Science project

    Language:Jupyter Notebook10