US Congressional Tweets NLP Dataset

This dataset analyzes tweets from US Congress members between June 2017 and July 2023, using natural language processing (NLP) techniques to detect patterns, sentiments, and key topics. This reveals quantifiable shifts in legislative communication, shedding light on evolving political priorities and the ways in which they are expressed.

📜 Introduction

This repository houses the code and data for the US Congress Tweets NLP Project. The project aims to analyze and gain insights from tweets posted by members of the US Congress. By applying Natural Language Processing (NLP) techniques, the project aims to uncover linguistic patterns, sentiment analysis, and named entities present in the tweets.

🗃️ Data Source

The primary dataset used for this project is sourced from the "congresstweets" dataset, available on GitHub at https://github.com/alexlitel/congresstweets. This dataset provides a comprehensive collection of tweets from US congressional members, spanning the period from June 2017 to July 2023.

🏗 Project Structure

The repository is organized as follows:

Code/: Contains all the Python scripts and Jupyter notebooks used for data preprocessing, analysis, and visualization.
Data/: Houses the raw and processed data files.
- Note: Due to size constraints, the dataset is not uploaded to GitHub. For dataset access, email: chasen.jeffries@cgu.edu.
Documents/: Documentation files, including the README you're reading right now.

🛠 Dependencies

Ensure you have the necessary dependencies installed to run the scripts in the code/ directory. You can find the list of required packages in the requirements.txt file.

To install the required dependencies, run:

pip install -r requirements.txt

Chasen-Jeffries/US_Congress_Tweet_NLP_Dataset

US Congressional Tweets NLP Dataset

📜 Introduction

🗃️ Data Source

🏗 Project Structure

🛠 Dependencies