LOBCAST is a Python-based open-source framework developed for stock market trend forecasting using Limit Order Book (LOB) data. The framework enables users to test deep learning models for the task of Stock Price Trend Prediction (SPTP). It serves as the official repository for the paper titled LOB-Based Deep Learning Models for Stock Price Trend Prediction: A Benchmark Study [paper].
The paper formalizes the SPTP task and the structure of LOB data. In the following sections, we elaborate on downloading LOB data, running stock predictions using LOBCAST with your new DL model, model evaluation and comparison.
This main branch represents a newer version of LOBCAST named mini-LOBCAST. It enables benchmarking models on the standard LOB dataset used in the literature, specifically FI-2010 [dataset]. This version will be expanded to include more datasets with procedures for handling data consistently for benchmarking. These procedures are already available in the branch v0-LOBCAST, which will be integrated soon. We encourage the use of this version, while also recommending a glance at the other branch for additional implemented models and functions.
You can install LOBCAST by cloning the repository and navigating into the directory:
git clone https://github.com/matteoprata/LOBCAST.git
cd LOBCAST
Install all the required dependencies:
pip install -r requirements.txt
To download the FI-2010 Dataset [dataset], follow these instructions:
- Download the dataset into
data/datasets
by running:
mkdir data/datasets
cd data/datasets
wget --content-disposition "wget --content-disposition "https://download.fairdata.fi:443/download?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE3MTU1MDU5OTksImRhdGFzZXQiOiI3M2ViNDhkNy00ZGJjLTRhMTAtYTUyYS1kYTc0NWI0N2E2NDkiLCJwYWNrYWdlIjoiNzNlYjQ4ZDctNGRiYy00YTEwLWE1MmEtZGE3NDViNDdhNjQ5X2JoeXV4aWZqLnppcCIsImdlbmVyYXRlZF9ieSI6IjlmZGRmZmVlLWY4ZDItNDZkNS1hZmIwLWQyOTM0NzdlZjg2ZiIsInJhbmRvbV9zYWx0IjoiYjg1ZjNhM2YifQ.NOT94HPMUdwpi6lFsmnRhkToP2FAdmbmoEkhlRNBQGM"
- Unzip the file
- Run:
mv data/datasets/published data/datasets/FI-2010
- Unzip
data/datasets/FI-2010/BenchmarkDatasets/BenchmarkDatasets.zip
intodata/datasets/FI-2010/BenchmarkDatasets
.
Ensure that this path exists to execute LOBCAST on this dataset:
data/datasets/FI-2010/BenchmarkDatasets/NoAuction/1.NoAuction_Zscore/NoAuction_Zscore_Training/*
Run LOBCAST locally with an MLP model and FI-2010 dataset using default settings in src.settings
:
python -m src.run
To customize parameters:
python -m src.run --SEED 42 --PREDICTION_MODEL BINCTABL --OBSERVATION_PERIOD 10 --EPOCHS_UB 20 --IS_WANDB 0
This will execute LOBCAST with seed 42 on the FI-2010 dataset, using the BINCTABL model, with an observation period of 10 events, for 20 epochs, running locally (not on WANDB).
The run.py
file allows adjusting the following arguments, which are all attributes of the class src.settings.Settings
.
LOBCAST
optional arguments:
-h, --help show this help message and exit
--SEED
--DATASET_NAME
--N_TRENDS
--PREDICTION_MODEL
--PREDICTION_HORIZON_UNIT
--PREDICTION_HORIZON_FUTURE
--PREDICTION_HORIZON_PAST
--OBSERVATION_PERIOD
--IS_SHUFFLE_TRAIN_SET
--EPOCHS_UB
--TRAIN_SET_PORTION
--VALIDATION_EVERY
--IS_TEST_ONLY
--TEST_MODEL_PATH
--DEVICE
--N_GPUs
--N_CPUs
--DIR_EXPERIMENTS
--IS_WANDB
--WANDB_SWEEP_METHOD
--IS_SANITY_CHECK
At the end of the execution, json files containing all the statistics of the simulation and a PDF showing the performance
of the model will be created at data/experiments
.
To set up a simulation in terms of randomness, choice of dataset, choice of model, observation frame of models, and
whether to log the metrics locally or on WANDB, LOBCAST allows setting all these parameters by accessing the
Lobcast().SETTINGS
object. These parameters are set at the beginning of the simulation and overwritten by the arguments
passed from the command-line interface (CLI).
To find the right learning parameters of the model, hyperparameters can be specified in src.hyperparameters.HPTunable
.
By default, it contains the batch size, learning rate, and optimizer, but it can be extended by the user to specify other
parameters. Keep in this class all the hyperparameters common to all the models. In the following we will see how to add
model specific parameters.
You can specify all the values that the parameters can take as {'values': [1, 2, 3]}
, or the min-max range as
{'min': 1, 'max': 100}
.
The logic of the simulator can be summarized as follows:
- Initialize LOBCAST.
- Parse settings from the CLI.
- Update settings.
- Choose hyperparameter configurations.
- Run the simulation, including data gathering, model selection, and training loop.
- Generate a PDF with evaluation metrics.
- Close the simulation.
The code below shows the simulation logic in src.run
:
sim = LOBCAST()
setting_conf = sim.parse_cl_arguments() # parse settings from CLI
sim.update_settings(setting_conf) # updates simulation settings
hparams_configs = grid_search_configurations(sim.HP_TUNABLE.__dict__)[0] # a dict with params chosen by grid search at 0
sim.update_hyper_parameters(hparams_config) # update the simulation parameters
sim.end_setup()
sim.run() # run the simulation, data gathering, model selection and trainig loop
sim.evaluate() # generate a pdf with the evaluation metrics
sim.close()
Running multiple experiments sequentially is facilitated by instantiating an execution plan. Alternatively to src.run
,
one can run sequential tests from src.run_batch
.
ep = ExecutionPlan(setup01.INDEPENDENT_VARIABLES,
setup01.INDEPENDENT_VARIABLES_CONSTRAINTS)
setting_confs = ep.configurations()
An execution plan is defined in terms of INDEPENDENT_VARIABLES
and INDEPENDENT_VARIABLES_CONSTRAINTS
.
These are two dictionaries. The first dictionary represents the variables to vary in a grid search.
The INDEPENDENT_VARIABLES_CONSTRAINTS
dictionary allows defining how the variable should be set when it does not vary,
thus limiting the search concerning the grid search and eliminating certain configurations. The setting_confs
contain
the configurations to pass to sim.update_settings(setting_conf)
, iteratively.
To run an execution plan with the dictionaries defined in src.batch_experiments.setup01
, execute:
python -m src.run_batch
Procedures for gathering performances from different models and generating comprehensive plots for benchmarking will be added in a new update.
To integrate a new model into LOBCAST, follow these steps:
- Create model file: Add a
.py
file in thesrc.models
directory. Define your new model class, inheriting fromsrc.models.lobcast_model.LOBCAST_model
:
class MyNewModel(LOBCAST_model):
def __init__(self, input_dim, output_dim, param1, param2, param3):
super().__init__(input_dim, output_dim)
...
- Define hyperparameters: Optionally define the domains of your model parameters by creating a class that inherits from
src.hyper_parameters.HPTunable
:
class HP(HPTunable):
def __init__(self):
super().__init__()
self.param1 = {"values": [16, 32, 64]}
self.param2 = {"values": [.1, .5]}
- Declare LOBCAST module: Instantiate a
src.models.lobcast_model.LOBCAST_module
to encapsulate the model and its hyperparameters:
mynewmodel_lm = LOBCAST_module(MLP, HP())
- Declare model in Models enumerator: Add your model to the
src.models.models_classes.Models
enumerator:
class Models(Enum):
NEW_MODEL = mynewmodel_lm
Now, you can execute the new model using the command:
python -m src.run --SEED 42 --PREDICTION_MODEL NEW_MODEL --IS_WANDB 0
Any undeclared settings will be assigned default values.
Optionally, enforce constraints on the model settings using src.settings.Settings.check_parameters_validity
. For example:
constraint = (not self.PREDICTION_MODEL == cst.Models.NEW_MODEL or self.OBSERVATION_PERIOD == 10,
f"At the moment, NEW_MODEL only allows OBSERVATION_PERIOD = 10, {self.OBSERVATION_PERIOD} given.")
Ensure to add this constraint to src.settings.Settings.check_parameters_validity.CONSTRAINTS
for enforcement.
Prata, Matteo, et al. "LOB-based deep learning models for stock price trend prediction: a benchmark study." Artificial Intelligence Review 57.5 (2024): 1-45.
The recent advancements in Deep Learning (DL) research have notably influenced the finance sector. We examine the robustness and generalizability of fifteen state-of-the-art DL models focusing on Stock Price Trend Prediction (SPTP) based on Limit Order Book (LOB) data. To carry out this study, we developed LOBCAST, an open-source framework that incorporates data preprocessing, DL model training, evaluation and profit analysis. Our extensive experiments reveal that all models exhibit a significant performance drop when exposed to new data, thereby raising questions about their real-world market applicability. Our work serves as a benchmark, illuminating the potential and the limitations of current approaches and providing insight for innovative solutions.
Link: https://link.springer.com/article/10.1007/s10462-024-10715-4
LOBCAST was developed by Matteo Prata, Giuseppe Masi, Leonardo Berti, Andrea Coletta, Irene Cannistraci, Viviana Arrigoni.