Demo: Term Weighting for Document Similarity Testing
BradKML opened this issue · 1 comments
BradKML commented
After reviewing https://github.com/zayedrais/DocumentSearchEngine I was interested to see the performance of alternate term weighting schemes for document search. However they only used TF-IDF term matrix and cosine similarity.
Here are some ideas:
- Interchangeability of datasets
- Replacement of similarity measures for matrices (borrowing https://github.com/taki0112/Vector_Similarity )
- Alternate Term Weighting Schemes (this library)
I will be implementing this in the coming day, please do not hesitate to drop some other ideas.
ceteri commented
Thank you @BrandonKMLee,
The results from pytextrank
can be used as input for document similarity work, and there are many use cases which do that.
However, the library itself doesn't support document similarity comparisons. That work gets into much more application-specific details than we would be able to support based on this architecture.