Text-Similarity-Analysis

Short text similarity analysis based on news.

Tool

jieba: segment Chinese word
gensim: construct topic model, represent text as vector, and calculate similarity

Model

Term Frequency - Inverse Document Frequency(TF-IDF), Latent Semantic Indexing(LSI), Latent Dirichlet Allocation(LDA), doc2vec, bm25