/entity-resolution

Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar real entity across different digital entities present on same or different data sets. Record linking is necessary when joining different entities which are similar and may or may not share some common identifiers. Neo4j offers various advantages to perform entity resolution / record linking. This repository covers such a use case of linking similar user accounts for analytics and providing better recommendations.

Primary LanguageGo

entity resolution icon

Entity-Resolution-Demonstration Graph Example

Description: Entity Resolution, Record Linkage and Similarity wise recommendation with Neo4j

Nodes 1267 Relationships 1939

model
Figure 1. Model
example
Figure 2. Example
Example Query:
MATCH (u:User {state: $state} )-[:WATCHED]->(m)-[:HAS]->(g:Genre)

RETURN g.name as genre, count(g) as freq
ORDER BY freq DESC

Setup

This is for Neo4j version: 4.4

Required plugins: apoc, graph-data-science

Rendered guide available via: :play https://guides.neo4j.com/sandbox/entity-resolution

Load graph data via the following:

Data files: true

Import flat files (csv, json, etc) using Cypher’s LOAD CSV, APOC library, or other methods.

  • Drop the file into the Files section of a project in Neo4j Desktop. Then choose the option to Create new DBMS from dump option from the file options.

  • Use the neo4j-admin tool to load data from the command line with the command below.

bin/neo4j-admin load --from data/entity-resolution-44.dump [--database "database"]

Feedback

Feel free to submit issues or pull requests for improvement on this repository.