/Topic-Modeling-DA

Topic_Modeling_DA

Primary LanguageJupyter Notebook

Topic_Modeling_DA

Basic text preprocessing and topic modeling

  • The goal of this assignment is to get familiar with textual data analysis.

  • Task 1: The given dataset is a table containing questions about R on StackOverflow site. Your first task is to perform standard text prepossessing steps introduced in lectures for future tasks. You can perform analysis on title or body of the questions.

  • Task 2: Using existing libraries, such as gensim https://radimrehurek.com/gensim/ to learn word embeddings from the prepossessed text from previous step. At the end of this step, you should save the learned word embeddings in a file.

  • Task 3: Perform topic analysis on the prepossessed textual data. Briefly specify how you pick the number of topics. Present your findings (the final set of topics you extracted, the popularity of each topic).