FAMER_Clustering

FAMER is a research project designed for FAst Multi-source Entity Resolution. It is implemented on top of Apache Flink and the graph analytics tool Gradoop. The framework is still highly under development. So the whole code of FAMER is not publicly available yet. This repository provides the new clustering algorithm CLIP as well as the cluster repair algorithm RLIP that we presented in this paper at Extended European Semantic Web Conference in June (ESWC 2018).

In this repository you can find the following modules of FAMER:

famer-clustering: it contains the implementation of CLIP and the baseline method Connected Components.
famer-clusterPostProcessing: it contains the implementation of overlapResolve algorithm. Even though overlapped entities shared between multiple clusters is meaningless in the context of entity resolution, some ER clustering algorithms result into overlapped clusters. The overlapResolve algorithm resolves entities that are shared between several clusters and assigns them to only one cluster.
famer-common: it contains some APIs that are used in other modules.
famer-example: it contains the example scripts for both CLIP and RLIP algorithms as well as computing the quality of input graphs and clustering output in terms of FMeasure.
inputGraphs: in this folder you can find all generated input graphs by FAMER that we reported in our this papers ([1] and [2]) for all three datasets we listed and made publicly available in FAMER homepage.