Udacity Machine Engineer Nanodegree - Capstone Project
In this problem the problem of Stock Price Prediction was approached in two steps:
-
Data Exploration in order to find out the most interesting company among the 5 commodity-based companies in Brazil.
-
Data preparation, feature engineering, model training and evaluation.
All these steps were done in AWS SageMaker and used the AWS S3 to store the produced data and model artifacts.
The Jupyter Nootebook contains the code necessary to do the both steps mentioned above as well as explanations of the decision making process and how the code works.
A training algorithm for the LSTM model created with TensorFlow can be found in the train.py file. It basicly load the data, pre-process it, trains the model and save it in S3.
- Access to AWS plataform and its products like SageMaker and S3
- Jupyter Notebook
- yahooquery-API
- json
- numpy
- pandas
- datetime
- os
- matplotlib.pyplot
- plotly.graph_objects
- plotly.express
- plotly.subplots.make_subplots
- sklearn.preprocessing
- sklearn.metrics
- sklearn.ensemble
- tensorflow
- keras
The project is guided by the following general outline:
a. Gathering the Data;
b. Cleaning and Exploration;
- b.1. Checking missing values and a first look at the data;
- b.2. Checking the variation of the data over time with a line graph;
- b.3. Using the Moving Average to get another view of price variation;
- b.4. Visualization of Volumes over time;
- b.5. Daily Return;
- b.6. Risk Analysis;
- b.7. Correlations;
c. Data Preparation;
- c.1. Train-Test Split;
- c.2. Upload the data to S3;
d. Model Building;
- d.1. RandomForest Regressor;
- d.2. AWS DeepAR;
- d.3. LSTM Model with Tensor Flow;
e. Results;
f. Conclusion and Next Steps;
E-mail: pedrocouto39@gmail.com
LinkedIn: https://www.linkedin.com/in/pdr-couto
Kaggle: https://www.kaggle.com/pedrocouto39
XING: https://www.xing.com/profile/Pedro_Couto8/cv
Project Link: https://github.com/PedroHCouto/UDACITY-ML-Engineer-Nanodegree-Project