/Locality-sensitive-hashing-tutorial

A tutorial on locality sensitive hashing, using MinHashing for document similarity and CosineSimilarity for Euclidean space similarity.

Primary LanguageJupyter Notebook

Locality Sensitive Hashing Tutorial

As the name suggests, this is a tutorial on locality sensitive hashing. All of the information is contained in the notebook.

The sampledocs folder contains some artificial data for performing the document similarity task. It consists of news articles pulled from cnn, with one document consisting of partial concatenations of the others. This is to create artificilly similar documents, which our algorithms are trying to find.

The similarity task for vectors can easily generate synthetic data by just creating random matrices, so we do that in the notebook.