Demo: Term Weighting for Document Similarity Testing

Question

Demo: Term Weighting for Document Similarity Testing

BradKML opened this issue 3 years ago · 1 comments

After reviewing https://github.com/zayedrais/DocumentSearchEngine I was interested to see the performance of alternate term weighting schemes for document search. However they only used TF-IDF term matrix and cosine similarity.
Here are some ideas:

Interchangeability of datasets
Replacement of similarity measures for matrices (borrowing https://github.com/taki0112/Vector_Similarity )
Alternate Term Weighting Schemes (this library)
I will be implementing this in the coming day, please do not hesitate to drop some other ideas.

Answer 1 · 2022-07-25T18:20:47.000Z

Thank you @BrandonKMLee,

The results from pytextrank can be used as input for document similarity work, and there are many use cases which do that.

However, the library itself doesn't support document similarity comparisons. That work gets into much more application-specific details than we would be able to support based on this architecture.