Jupyter notebook to find duplicates and generate internet archive shell commands to delete them.
The software is preliminary and would need lots of cleanup
Duplicate items are 0.4% only.
For detailed information, checkout Telugu deduplication at https://github.com/arjunaraoc/Deduplicate-DLI
OR
Clone repository and open the ipynb file
##Requirements Python==3.6.7 pandas numpy