Understanding Sentiment Effect On Similarity Networks for All the News - UML Course Project
The objective of this project is to understand how presence or absense of emotionally charged words effect communities of news documents. The project flow is outlined below:
- Clean, tokenize, stem, and lemmatize news articles
- Calculate document similarities
- Create network of similarities using similarity threshold
- Run community detection on similarity network
- Remove words that exceed a sentiment intensity threshold and repeat steps 1-4
- Evaluate the changes between similarity networks
Please see the notebooks where we perform data cleaning, outlier detection and checked data distributions.
data_exploration.ipynb
Data Exploration 2.ipynb
- Major aspects of data pipeline completed in src folder
- More data exploration for sentiment in data_exploration.ipynb
- Beginning to experiment notebook which will direct the output of the project