/bert-topic-extraction

Topic Modeling with BERT using as baseline Dimensionality Reduction and Clustering on the TF-IDF Matrix.

MIT LicenseMIT

NLP with Python: Topic Modeling with BERT

In the following project it was attempted to extract the topics regarding the dataset of Amazon Book's reviews. The main challenges were to handle correctly the large amount of data in order to apply a feasible pipeline.

Baseline Methods

As a baseline methods were used mini-batch K-means and SVD and additionally also K-Means applied on the SVD dimensionality reduction of the TF-IDF matrix. Here are the results obtained after the topic reduction:

baseline

Sentence Transformers

For this part instead of the TF-IDF matrix were used Sentence Transformers