This project aims to predict stock prices using machine learning techniques, Apache Spark, and Apache Cassandra. The goal is to build a data pipeline that collects stock data, preprocesses it, engineers features, and trains a machine learning model to predict future prices.
- Collect stock data from multiple sources using the Alpha Vantage API.
- Store stock data in Apache Cassandra for efficient and scalable data management.
- Preprocess data by cleaning, handling missing values, and normalizing data.
- Engineer features by calculating technical indicators, adding sentiment analysis, and including macroeconomic indicators.
- Train and evaluate machine learning models using Apache Spark.
- Perform data analysis and predictions using Spark and Cassandra.
- Apache Spark
- Apache Cassandra
- Alpha Vantage API
- Python
- Scikit-learn
- Pandas
To install and run this project, follow these steps:
- Clone the repository to your local machine.
- Install Apache Cassandra and Spark on your local machine or cluster.
- Install the required Python libraries by running pip install -r requirements.txt.
- Set up the Alpha Vantage API and obtain an API key.
- Configure the Cassandra keyspace and table by running python cassandra_config.py.
- Collect stock data by running python data_collection.py.
- Preprocess the data by running python data_preprocessing.py.
- Engineer features by running python feature_engineering.py.
- Train and evaluate the machine learning model by running python model_building.py.
- Perform data analysis and predictions by running python spark_integration.py.
The project can be used to predict stock prices by running the scripts provided in the repository. The data can be customized by modifying the parameters in the scripts or by using a different data source. The machine learning model can be modified by changing the algorithm or the hyperparameters.
The data used in this project is collected from the Alpha Vantage API, which provides real-time and historical stock data in various formats.
@sabareh
This project is released under the MIT License.
Alpha Vantage API for providing the stock data.