This project aims to build a real-time financial sentiment analysis system by collecting tweets related to finance and analyzing their sentiment using GPT-3 integrated with LangChain. The system includes a data pipeline for tweet collection and preprocessing, a sentiment analysis module, and a visualization dashboard.
- Real-time Tweet Collection: Uses Tweepy to scrape tweets containing financial keywords.
- Data Preprocessing: Filters out noise, handles language variations, and cleans text data.
- Sentiment Analysis: Integrates GPT-3 with LangChain for enhanced sentiment classification.
- Visualization Dashboard: Displays sentiment trends using an interactive Streamlit app.
- Scalable Deployment: Designed to be deployed on AWS for scalability and reliability.
- Project Structure
- Prerequisites
- Installation
- Configuration
- Running the Application
- Usage
- Testing
- Contributing
- License
- Acknowledgments
financial_sentiment_analysis/
├── README.md
├── requirements.txt
├── .gitignore
├── config/
│ └── config.py
├── data/
│ ├── raw/
│ ├── processed/
│ └── sentiment_data.csv
├── logs/
│ └── app.log
├── scripts/
│ ├── data_collection.py
│ ├── data_preprocessing.py
│ ├── sentiment_analysis.py
│ └── utils.py
├── dashboard/
│ └── dashboard.py
├── models/
│ └── __init__.py
└── tests/
└── test_utils.py
config/
: Configuration files and settings.data/
: Contains raw, processed, and final sentiment data.logs/
: Log files for monitoring and debugging.scripts/
: Core scripts for data collection, preprocessing, and analysis.dashboard/
: Streamlit app for data visualization.models/
: Placeholder for any machine learning models (optional).tests/
: Unit tests for the project.
- Python 3.7 or higher
- Twitter Developer Account: Access to the Twitter API.
- OpenAI Account: API key for GPT-3.
- AWS Account (Optional): For deployment purposes.
git clone https://github.com/yourusername/financial_sentiment_analysis.git
cd financial_sentiment_analysis
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts ctivate`
pip install -r requirements.txt
Create a .env
file in the project root directory:
touch .env
Add your API keys and configuration variables to the .env
file:
# Twitter API credentials
TWITTER_CONSUMER_KEY=your_consumer_key
TWITTER_CONSUMER_SECRET=your_consumer_secret
TWITTER_ACCESS_TOKEN=your_access_token
TWITTER_ACCESS_SECRET=your_access_secret
# OpenAI API key
OPENAI_API_KEY=your_openai_api_key
The config/config.py
file is already set up to read from environment variables. Ensure that all paths and settings meet your requirements.
Run the data collection script to start collecting tweets in real-time.
python scripts/data_collection.py
- Description: This script uses Tweepy to stream tweets containing predefined financial keywords.
- Output: Raw tweet data saved as JSON files in
data/raw/
.
Process the collected tweets to clean and normalize the text.
python scripts/data_preprocessing.py
- Description: Filters out non-English tweets, removes noise like URLs and mentions, and normalizes the text.
- Output: Cleaned tweet data saved in
data/processed/
.
Perform sentiment analysis on the preprocessed tweets.
python scripts/sentiment_analysis.py
- Description: Uses GPT-3 via LangChain to classify the sentiment of each tweet.
- Output: Sentiment data saved to
data/sentiment_data.csv
.
Launch the Streamlit dashboard to visualize sentiment trends.
streamlit run dashboard/dashboard.py
- Description: Displays real-time sentiment analysis results, including sentiment distribution and trends over time.
- Access: Open the provided local URL in your web browser.
- Filtering Data: Use the sidebar in the dashboard to filter data by date range.
- Viewing Latest Tweets: The dashboard displays the latest analyzed tweets with their sentiment classification.
- Monitoring Logs: Check
logs/app.log
for real-time logging information.
Run unit tests to verify that utility functions are working as expected.
python -m unittest discover tests
- Description: Executes all tests in the
tests/
directory.
Contributions are welcome! Please follow these steps:
Click the "Fork" button at the top right of the repository page.
git checkout -b feature/your_feature_name
git commit -am 'Add new feature'
git push origin feature/your_feature_name
Submit a pull request detailing your changes.
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing the language model used.
- LangChain for seamless LLM integration.
- Streamlit for the interactive dashboard framework.
- Tweepy for Twitter API integration.
Feel free to open an issue if you find any bugs or have suggestions for improvements. Happy coding!