This repository is the official implementation of the paper Deep Generative Symbolic Regression.
- Run/Follow steps in install.sh.
- Download the baseline pre-trained models for NeuralSymbolicRegressionThatScales from https://github.com/SymposiumOrganization/NeuralSymbolicRegressionThatScales and put them into the folder models/nesymres_pre_train.
- Replicate experimental results by running and configuring run_recovery_multi.py.
- Process the output log file using process_logs.py by updating the
LOG_PATH
variable to point to the recently generated log file.
The simplest way to configure the experiments is to modify the following parameters at the top of run_recovery_multi.py
conf.exp.seed_runs = 40
conf.exp.n_cores_task = 1 # 7 if GPU memory is at least 24GB, else tune to be smaller
conf.exp.seed_start = 0
conf.exp.baselines = ["DGSR-PRE-TRAINED", "NGGP", "NESYMRES", "GP"]
# User must specify the benchmark to run:
conf.exp.benchmark = "fn_d_2" # Possible values ["fn_d_2", "fn_d_5", "l_cd_12", ""fn_d_all"]
The final parameter conf.exp.benchmark
is the most important, as it corresponds to which benchmark experiment is to be run. Select one of the possible values specified in the comment.
Re-training the models for a new dataset or equation class of new input variables of token set library:
First configure and then run the file run_pretrain.py. This file is used to pre-train the DGSR model for a specific dataset that uses a specific number of input variables (covariates) as defined by the test dataset equation selected. This will train the model essentially forever, and continually saves the model during training. After a period of 4-5 hours, the model performance loss will plateau. The user will need to then manually stop the training process, and copy the saved model (controller.pt
) and model configuration file (config.json
) from the log directory (./log/{$RUN}
, where $RUN
is a folder name that includes the trained dataset and date as a string) to the pre-trained models directory (./models/dgsr_pre-train/
), and then update the path to load when testing in run_recovery_multi.py
in the dict COVARS_TO_PRE_TRAINED_MODEL
.