/MinHash-DocSimilarity

Use Min-Hash to Compare Different Docs Similarity

Primary LanguagePythonApache License 2.0Apache-2.0

MinHash-DocSimilarity

Using the Min-Hash algorithm to compare different docs’ similarities.

We skip the step of splitting words.

It’s a simple and crude code implementation in Python in O(N^3) complexity.

You may find many redundant data structures (forgive it, it’s just derived from a tiny homework), but the whole process follows the origin theory clearly.