make install
- https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html
- https://spacy.io/usage/vectors-similarity
- https://github.com/soundcloud/cosine-lsh-join-spark
- https://index.scala-lang.org/soundcloud/cosine-lsh-join-spark/cosine-lsh-join-spark/1.0.6?target=_2.11
- https://github.com/linkedin/scanns
- https://janzhou.org/learn/lsh.html
- https://www.quora.com/What-are-some-good-LSH-implementations
- https://index.scala-lang.org/soundcloud/cosine-lsh-join-spark/cosine-lsh-join-spark/1.0.6?target=_2.11
- QueryByExample:
- Evaluate different strategies to select query words (since es can only handle up to 50 terms fast)
- Sort set of terms by corpus-statistics (e.g. document-frequency). Scope? Document, sentence, paragraph?
- take elements from the median