What do people complain about on Reddit?
In this project, I use NLP topic modeling techniques
to find the topics that people complain about on the subreddit r/complaints.
This project uses:
- Python 3.8.1
- gensim 3.8.3
- spaCY 2.3.2
- NLTK 3.5
To run visualization:
- jupyter-notebook 6.0.3
- Pandas 1.0.3
- Seaborn 0.10.1
- Get most recent reddit data using scrape.py. See existing scrape in data/complaints_new_13082020.csv.
- Run and save models using topic_modeling.py. (All models are LDA unless specified otherwise.)
- Visualize using jupyter notebook visualization.ipynb
I found that a 5-topic LDA model worked best.
Redditors complained about:
- People, in general
- Posts people make on reddit
- Insurance companies
- Games that are played
- Work, especially issues pertaining to time
The LSI model returned very similar results, except that it found that redditors complained about car problems rather than insurance companies.
I checked with the subreddit, and confirmed that the results were at least somewhat accurate.
Note that indexing in project starts at 0, so just subtract 1 when looking at visualization.ipynb.