
All files for my bachelor thesis on the utility of clustering algorithms in the context of social media research at the Technische Universität in Berlin, Germany.

Primary LanguageJupyter Notebook

Topic models and text embeddings

This repo contains all files for my bachelor thesis on the utility of clustering algorithms at the Technische Universität in Berlin, Germany.


  • Topic_Models_and_Embeddings.ipynb: A copy of the final Google Colab pipeline
  • gensim_lda.ipynb: The LDA pipeline
  • get_embeddings.py: A python script to create text embeddings from a CSV file
  • text_embeddings_final_pipeline: The final text embeddings pipeline
  • text_embeddings_it{n}: The n-th iteration of my experiments with the text embeddings pipeline