Welcome to the repository for the RLC 2024 paper titled "Boosting Soft Q-Learning by Bounding" by Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, and Rahul Kulkarni Here you can reproduce the paper's experiments, and try bounded soft Q-learning on your own environments.
We use Gymnasium for RL environments and wandb for logging / hyperparameter tuning.
To prepare an environment with conda:
- setup a conda env
conda create --name qbounds python=3.10
- activate the conda env
conda activate qbounds
- python requirements:
pip install -r requirements.txt
-
Run the best hparams for an environment
python experiments.py --env_id=CartPole-v1
-
plot the results for the spcific environment
python tb_plotter.py --env_id=CartPole-v1
- Run the hparam sweep for an environment
python experiments.py --env_id=CartPole-v1 --do_sweep=True
- extract the best hparams logged with wandb
python wandb_best_hparams.py --env_id=CartPole-v1 --entity=your_wandb_entity
- Run the lr sweep for each of the methods of interest. It will use 30 mazes contained in the mazes directory.
python tabular_experiments/tabular_lr_sensitivity.py