Topic Modeling, Classification, Natural Language Processing
Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus.
We intend to work on a research paper where we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data.
This project includes the following pipeline: Dataset Collection, Topic Modeling, Fine-tuning the parameters for topic modeling, Classification