/LOB

Benchmark Dataset of Limit Order Book in China Markets

Primary LanguagePythonMIT LicenseMIT

LOB

Benchmark Dataset of Limit Order Book in China Markets

FinAI Laboratory

Hong Kong Graduate School of Advanced Studies

contact@gsas.edu.hk

Table of Contents

  1. Introduction
  2. Abstract
  3. Keywords
  4. Models
  5. Data Format
  6. Installation and Usage
  7. Results

Introduction

This repository contains the dataset and codes described in the paper "Benchmark Dataset for Short-Term Market Prediction of Limit Order Book in China Markets". Five baseline models, inculding linear regression (LR), multilayer perceptron (MLP), convolutional neural network (CNN), long short term memory (LSTM), and CNN-LSTM, are tested on the proposed benchmark dataset.

Note

  1. All algorithms are implemented based on the deep learning framework PyTorch.
  2. Our PyTorch version is 1.7.0. If you are in a lower version, please modify the codes accordingly.

Abstract

Limit Order Book (LOB) has generated “big financial data” for analysis and prediction from both academic community and industry practitioners. This paper presents a benchmark LOB dataset of China stock market, covering a few thousand stocks for the period of June to September 2020. Experiment protocols are designed for model performance evaluation: at the end of every second, to forecast the upcoming volume-weighted average price (VWAP) change and volume over 12 horizons ranging from 1 second to 300 seconds. Results based on linear regression model and state-of-the-art deep learning models are compared. Practical short-term trading strategy framework based on the alpha signal generated is presented.

Keywords

High-Frequency Trading, Limit Order Book, Artificial Intelligence, Machine Learning, Deep Neural Network, Short-Term Price Prediction, Alpha Signal, Trading Strategies, China Stock Market

Models

  1. Configuration of the linear regression model: Linear Regression

  2. Configuration of the multilayer perceptron model: Multilayer Perceptron

  3. Configuration of the shallow LSTM model: Long Short Term Memory

  4. Configuration of the CNN model: Convolutional Neural Network

  5. Configuration of the CNN-LSTM model: CNN-LSTM

Data Format

The folder structure of the LOB dataset is like the following.

   .\LOB_data
         .\2020.6
	 .\2020.7
	 .\2020.8
	 .\2020.9
	 lob_sz_6789_train_val.txt
	 lob_sz_678_train.txt
	 lob_sz_9_val.txt 

"lob_sz_678_train.txt" is the file list used to train the machine learning models, and "lob_sz_9_val.txt" is the file list used to test the accuracy as the validation. In each folder under ".\LOB_data", there are monthly LOB features in ".csv" format for many different stocks. These ".csv" files store all the LOB features of stocks row by row consecutively. The detailed explaination of these LOB features can be found here (in English and in Chinese).

Installation and Usage

Please refer to the ReadMe.txt in ./lob_modeling to install and run experiments.

Results

  1. Model performance metrics for different horizons computed on the test folds Results