Pinned Repositories
blink
This is main code for Steorts (2015), which is also on CRAN. Please cite the paper/code if you find this useful.
clevr
Clustering and Link Prediction Evaluation in R
cora
Cora data set for Entity Resolution
dblink
Distributed Bayesian Entity Resolution in Apache Spark
dblinkR
An R interface for the dblink Spark application
exchanger
Bayesian Entity Resolution with Exchangeable Random Partition Priors
fasthash
Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).
microclustr
Package for Betancourt, Zanella, and Steorts
record-linkage-tutorial
A tutorial on entity resolution (record linkage or de-duplication)
representr
Create representative records post-record linkage
cleanzr's Repositories
cleanzr/record-linkage-tutorial
A tutorial on entity resolution (record linkage or de-duplication)
cleanzr/dblink
Distributed Bayesian Entity Resolution in Apache Spark
cleanzr/fasthash
Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).
cleanzr/clevr
Clustering and Link Prediction Evaluation in R
cleanzr/representr
Create representative records post-record linkage
cleanzr/exchanger
Bayesian Entity Resolution with Exchangeable Random Partition Priors
cleanzr/blink
This is main code for Steorts (2015), which is also on CRAN. Please cite the paper/code if you find this useful.
cleanzr/dblinkR
An R interface for the dblink Spark application
cleanzr/cora
Cora data set for Entity Resolution
cleanzr/microclustr
Package for Betancourt, Zanella, and Steorts
cleanzr/BDD
Duplicate detection in R using a Bayesian partitioning approach
cleanzr/restaurant
Restaurant data set for entity resolution
cleanzr/RLdata
cleanzr/cd
CD dataset for Entity Resolution
cleanzr/dblink-experiments
Details for reproducing the experiments in our d-blink paper
cleanzr/exchanger-experiments
Scripts for reproducing the experiments in our JSSAM article on Bayesian Graphical Entity Resolution
cleanzr/italy
A sample survey conducted by the Bank of Italy every two years containing duplicated data.
cleanzr/klsh
Blocking for record linkage
cleanzr/posters
Posters on Data Cleanzing
cleanzr/smered
cleanzr/tlsh