Short text similarity analysis based on news.
jieba: segment Chinese word
gensim: construct topic model, represent text as vector, and calculate similarity
Term Frequency - Inverse Document Frequency(TF-IDF), Latent Semantic Indexing(LSI), Latent Dirichlet Allocation(LDA), doc2vec, bm25