/How_Many_topics

GSSOC-20 Extended- Based on Topic Modeling, Classification, Natural Language Processing

Primary LanguageJupyter Notebook

How_Many_topics

GitHub forks GitHub contributors GitHub issues-closed GitHub pull-requests PyPI pyversions

About How Many Topics?

Topic Modeling, Classification, Natural Language Processing

Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus.

We intend to work on a research paper where we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data.

This project includes the following pipeline: Dataset Collection, Topic Modeling, Fine-tuning the parameters for topic modeling, Classification