MinHash-DocSimilarity

Using the Min-Hash algorithm to compare different docs’ similarities.

We skip the step of splitting words.

It’s a simple and crude code implementation in Python in O(N^3) complexity.

You may find many redundant data structures (forgive it, it’s just derived from a tiny homework), but the whole process follows the origin theory clearly.

Wind-Gone/MinHash-DocSimilarity

MinHash-DocSimilarity