/DLMatchers

Deep Learning-based Entity Matching

Primary LanguagePython

DLMatchers

This repository contains pointers to all code and data used in our publication on A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms.

All datasets are available here.

The code that was used for generating the new benchmark datasets is available here. The input data to these scripts can be found here.

The implementation of the non-neural, linear supervised matching algorithms is available here.

To create a Docker container for the main DL-based matching algorithms run

sudo docker build -t mostmatchers mostmatchers

To log into the container, use the following command:

sudo docker run -it --entrypoint=/bin/bash mostmatchers

To use the GPUs of the underlying infrastructure, Nvidia Container Toolkit should be installed and the flag

--gpus all

should be added to the command that initiates the Docker container.

More details are provided here.

To clean up all disk space occupied by Docker (after many experimentations), use the following commands:

  • sudo docker system prune -a
  • sudo docker volume prune