In the following project it was attempted to extract the topics regarding the dataset of Amazon Book's reviews. The main challenges were to handle correctly the large amount of data in order to apply a feasible pipeline.
As a baseline methods were used mini-batch K-means and SVD and additionally also K-Means applied on the SVD dimensionality reduction of the TF-IDF matrix. Here are the results obtained after the topic reduction:
For this part instead of the TF-IDF matrix were used Sentence Transformers