a project using simhash algorithm to filter near-duplicate documents.
Primary LanguagePython
No issues in this repository yet.