Griffin Klett (gk2591)
Nathan Cuevas (njc2150)
Professor Parijat Dube | Deep Learning and Systems Performance COMS 6998 | Spring 2022 | Columbia University
The goal of this project is to implement and backtest deep learning models that build and expand upon approaches in existing literature on Bitcoin prediction. We will then then translate and apply approaches from previous literature to create a classifier that when given previous data, can predict whether the price will rise or fall the following day. We hope to leverage our domain knowledge in blockchains to design this model and carefully select an ideal feature set to create a profitable model. We will test the performance of the model by measuring test and validation accuracy and by creating a trading simulation that trades an initial investment on previous data.
This repository's directory structure is as follows:
├── code
│ ├── model.py
│ ├── notebooks
│ │ ├── CNN.ipynb
│ │ └── DNN.ipynb
│ └── scripts
│ ├── reverse.py
│ └── script.py
├── data
│ ├── ActiveERC20Addrs.csv
│ ├── BTC-USD.csv
│ ├── BlockDifficulty.csv
| ...
│ └── data_v2.csv
├── documents
│ ├── Final_Project_Proposal.docx
│ └── Midterm_Seminar Presentation.pptx
├── readme.md
└── references
├── A_Comparative_Study_of_Bitcoin_Price_Prediction_Using_Deep_Learning_Ji_Et_Al.pdf
├── Applying_Deep_Learning_to_Better_Predict_Cryptocurrency_Trends.pdf
└── Forecasting_the_Price_of_Bitcoin_using_deep_learning.pdf
The top level directory has the following folders:
-
Code:
- The folder
notebooks
contains the notebooks that were used in implementing and testing the model alongside the plots. - The folder
scripts
contains the scripts that were used to consolidate data on blockchians found on the web as well as generate the plots for the report. model.py
contains most of our code in a singlular file. See instructions below on how to use this file.
- The folder
-
Data:
-
Data that was gathered from the web.
data_v2.csv
aggragates all of this data and is currently the dataset being used in our expriments. -
Contains the experiment data in the
experiment_data
directory. -
Documents:
- Includes our slides for the presentation as well as the proposal and slides from our related midterm seminar presentation.
-
References
- contains the top papers that we considered for this project
dependencies:
Before running any of our code, please make sure the following dependencies are installed
- python 3.8.1+
- pandas 1.4.1+
- numpy 1.21.5+
- tensorflow 2.8.0+
- matplotlib 3.5.1+
model.py
Navigate to code/model.py
to run the code we used for this project. Set the following flags to use. For examples, refer to end of this section.
--model
: This is a required flag. Options areCNN
andDNN
which runs the convolutional neural network model and the deep neural network respectively.--m
: The number of days that each training example is allowed to look back. See the slides indocuments
folder for more details. This is an optional flag, default is 30.--epochs
: The number of epochs to run the model. This is an optional flag, default is 100.--simulate
: Will run our simulation that gives a trading bot a $10,000 inital investment. The integer that is passed after this flag will be how many times the simulation is run. The average end investment will be determined and printed after all the simulation runs. This is an optional flag, default is 0.
To simply check if the model runs, run the following:
python model.py --model CNN
To run our simulation 10 times for 100 epochs with
python model.py --model DNN --epochs 150 --m 35 --simulate 10
summary
Overall, we were largely unsuccessful in building a profitable, deployable trading algorithm for ethereum prices; our models were unable to generalize to unseen data, regardless of how we normalized or sliced test/train, and regardless of how much localized information was fed into the model. Further, our model did not account for trading fees or associated liquidity costs with executing the algorithm.
For example, unlike previous research on Bitcoin, our performance did not depend in any meaningful way on the size of m per training/testing The two possible consequences of these results suggest an improvement is needed in the data collection/exploration domain, or that Ethereum’s market price is not subject to any discernible pattern by our DNN/CNN Models. Further research could incorporate using some better scaling of the network or restricting the test/training time to the period after which ethererum had 1000x’d network size; this was impossible for us as the dataset remains too small but will be possible in the future.
experiment results/plots
Validation accuracy for
Contrary to Ji et Al's research, there was no significant improvement when adjusting the value of
CNN and DNN training and validation accuracies amongst several values of
The results of of trading algorithm using our DNN and CNN models after given an initial investment of $10,000 and running on all of the uneen days. The results were averaged over 5 runs.