FinAI Laboratory
Hong Kong Graduate School of Advanced Studies
This repository contains the dataset and codes described in the paper "Benchmark Dataset for Short-Term Market Prediction of Limit Order Book in China Markets". Five baseline models, inculding linear regression (LR), multilayer perceptron (MLP), convolutional neural network (CNN), long short term memory (LSTM), and CNN-LSTM, are tested on the proposed benchmark dataset.
Note
- All algorithms are implemented based on the deep learning framework PyTorch.
- Our PyTorch version is 1.7.0. If you are in a lower version, please modify the codes accordingly.
Limit Order Book (LOB) has generated “big financial data” for analysis and prediction from both academic community and industry practitioners. This paper presents a benchmark LOB dataset of China stock market, covering a few thousand stocks for the period of June to September 2020. Experiment protocols are designed for model performance evaluation: at the end of every second, to forecast the upcoming volume-weighted average price (VWAP) change and volume over 12 horizons ranging from 1 second to 300 seconds. Results based on linear regression model and state-of-the-art deep learning models are compared. Practical short-term trading strategy framework based on the alpha signal generated is presented.
High-Frequency Trading, Limit Order Book, Artificial Intelligence, Machine Learning, Deep Neural Network, Short-Term Price Prediction, Alpha Signal, Trading Strategies, China Stock Market
The folder structure of the LOB dataset is like the following.
.\LOB_data
.\2020.6
.\2020.7
.\2020.8
.\2020.9
lob_sz_6789_train_val.txt
lob_sz_678_train.txt
lob_sz_9_val.txt
"lob_sz_678_train.txt" is the file list used to train the machine learning models, and "lob_sz_9_val.txt" is the file list used to test the accuracy as the validation. In each folder under ".\LOB_data", there are monthly LOB features in ".csv" format for many different stocks. These ".csv" files store all the LOB features of stocks row by row consecutively. The detailed explaination of these LOB features can be found here (in English and in Chinese).
Please refer to the ReadMe.txt in ./lob_modeling to install and run experiments.