- The Bayesian regression for “latent source model” was used for binary classification. Bayesian regression established theoretical and empirical efficacy of the method for the setting of binary classification.
- This project aims to predict the price variations of bitcoin, a virtual cryptographic currency.
- Implemented the Bayesian regression model to predict the future price variation of bitcoin.
- These predictions could be used as the foundation of a bitcoin trading strategy.
- To make these predictions, a machine learning technique - Bayesian regression was implemented in Python.
- Compute the price variations (Δp1, Δp2, and Δp3) for train2 using train1 as input to the Bayesian regression equation. Make sure to use the similarity metric in place of the Euclidean distance in Bayesian regression. If the ratio is > 1, declare y = 1, else declare y = 0. In general, to estimate the conditional expectation of y, given observation x, the equation is produced as below.
The datasets were saved in the /data folder. The original raw data can be found here:
http://api.bitcoincharts.com/v1/csv/
The datasets from this site have three attributes:
- Time in epoch,
- Price in USD per bitcoin,
- Bitcoin amount in a transaction (buy/sell).
However, only the first two attributes were relevant to this project.
-
To make the data to have evenly space records, all the records were taken within a 20 second window and replaced it by a single record as the average of all the transaction prices in that window. Not every 20 second window had a record; therefore those missing entries were filled using the prices of the previous 20 observations and assuming a Gaussian distribution. The raw data that has been cleaned was given in the file dataset.csv.
-
Finally, the data was divided into a total of 9 different datasets. The whole dataset was partitioned into three equally sized (50 price variations in each) subsets: train1, train2, and test. The train sets were used for training a linear model, while the test set was for evaluation of the model. There were three csv files associated with each subset of data: *_90.csv, *_180.csv, and *_360.csv. In _90.csv, for example, each line represented a vector of length 90 where the elements are 30 minute worth of bitcoin price variations as there were 20 second intervals and a price variation in the 91st column. Similarly, the *_180.csv represented 60 minutes of prices and *_360.csv represented 120 minutes of prices.