/PERC_TopicModel

LDA Topic Modeling of PERC Papers

Primary LanguageJupyter Notebook

PERC_TopicModel

LDA Topic Modeling of PERC Papers

This project focuses on using Latent Dirichlet Allocation to thematically analyze the physics education research literature, in the form of the PERC Physics Education Research Conference (PERC) Proceedings 2001-2018.

The code in this repository is described in a paper submitted for publication: Tor Ole B. Odden and Alessandro Marin, Marcos D. Caballero, Thematic Analysis of 18 Years of PERC Proceedings using Natural Language Processing (2020). The paper is available in arXiv: arxiv.org/abs/2001.10753.

To run the main notebook PERC_TopicModeling.ipynb install the required packages: pip install -r requirements.txt --user The required packages include Gensim (unsupervised semantic modelling on text), NLTK (Natural Language Tool Kit), LDAVis (interactive topic model visualization), scikit-learn, along with standard data analysis libraries such as pandas, numpy, and matplotlib.

Questions can be directed to Tor Ole Odden