Duplicate Detection

Efficient implementation to compute pairs with the lowest levenshtein distance in a list of excel data

How to install:

Install git

Clone repository: git clone https://github.com/austrian-code-wizard/duplicateDetector

Alternatively use the GitHub web GUI to clone the repository

Move into repository: cd duplicateDetector

Make sure you have python 3.7 installed.

Install virtualenv: python3 -m pip install virtualenv

Create venv: python3 -m virtualenv venv

Activate venv: . venv/bin/activate

Install repository: python setup.py install

Run example (make sure you change it to a valid excel file path and fields): python example.py