NLP with Python: Topic Modeling with BERT

In the following project it was attempted to extract the topics regarding the dataset of Amazon Book's reviews. The main challenges were to handle correctly the large amount of data in order to apply a feasible pipeline.

Baseline Methods

As a baseline methods were used mini-batch K-means and SVD and additionally also K-Means applied on the SVD dimensionality reduction of the TF-IDF matrix. Here are the results obtained after the topic reduction:

Sentence Transformers

For this part instead of the TF-IDF matrix were used Sentence Transformers

santurini/bert-topic-extraction

NLP with Python: Topic Modeling with BERT

Baseline Methods

Sentence Transformers