/dedup-dli-kan

Deduplicate Digital library of India Kannada collection

Primary LanguageJupyter NotebookOtherNOASSERTION

Digital library of India Kannada deduplication

Description

Jupyter notebook to find duplicates and generate internet archive shell commands to delete them.

The software is preliminary and would need lots of cleanup

Duplicate items are 0.4% only.

For detailed information, checkout Telugu deduplication at https://github.com/arjunaraoc/Deduplicate-DLI

Usage

Binder

OR

Clone repository and open the ipynb file

##Requirements Python==3.6.7 pandas numpy