[Feature] Gensim similarity text analysis
Opened this issue · 0 comments
This is a big feature, and is listed here as a placeholder for the conversation of when or if to add it.
Gensim is a free Python framework designed to automatically extract
semantic topics from documents, as efficiently (computer-wise) and
painlessly (human-wise) as possible.
...
Once these statistical patterns are found, any plain text documents can
be succinctly expressed in the new, semantic representation, and
queried for topical similarity against other documents.
http://radimrehurek.com/gensim/intro.html
There is also a pre-packaged server implementation of the library, that looks like it would be ideal as a dedicated processing server for document's similarity.
https://github.com/piskvorky/gensim-simserver
It uses an extreme free software license, the AGPL
This means you may use simserver freely in your application (even
commercial application!), but you must then open-source your
application as well, under an AGPL-compatible license.
But luckily for us, our license is totally compatible with theirs.