FinalsClub/djKarma

[Feature] Gensim similarity text analysis

Opened this issue · 0 comments

This is a big feature, and is listed here as a placeholder for the conversation of when or if to add it.


Gensim is a free Python framework designed to automatically extract
semantic topics from documents, as efficiently (computer-wise) and
painlessly (human-wise) as possible.
...
Once these statistical patterns are found, any plain text documents can
be succinctly expressed in the new, semantic representation, and
queried for topical similarity against other documents.

http://radimrehurek.com/gensim/intro.html

There is also a pre-packaged server implementation of the library, that looks like it would be ideal as a dedicated processing server for document's similarity.

https://github.com/piskvorky/gensim-simserver

It uses an extreme free software license, the AGPL

This means you may use simserver freely in your application (even
commercial application!), but you must then open-source your
application as well, under an AGPL-compatible license.

But luckily for us, our license is totally compatible with theirs.