Do transformers and recurrent neural networks loose plasticity in partially observable reinforcement learning tasks?
Reinforcement Learning project for CMPUT 655: Reinforcement Learning I
conda create --name rl_project python=3.9
conda activate rl_project
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install jupyter
pip install "popgym[baselines]"
pip install tensorflow_probability==0.20.0
pip install mazelib
conda install matplotlib
cp custom_models/gtrxl.py ~/miniconda3/envs/rl_project/lib/python3.9/site-packages
Run the following in a code cell before the rest of the project:
!pip install "popgym[baselines]"
!git clone https://github.com/john-science/mazelib.git
!pip install -r mazelib/requirements.txt
!pip install mazelib/
In the repository directory, there are three Jupyter notebooks:
-
Random_Agent_of_RL_env: Runs and saves the performance of a Random action agent on 100 seeds for all the required different environments.
-
pop_gym_env_exploration: Looks at some parts of the POPGym environments and prints out some detailed information for our exploration of the parts of the environments.
-
rl_project_experiment_structure: Python notebook defining the structure of the experiments performed. (For final experiments refer to final_experiment directory).
Also, there are three directories:
-
custom_models: Contains a modified GTRXL model code which we use in some initial experiments.
-
final_experiment: Contains the final results of the GRU and FART(fast autoregressive transformers) based agent along with all our graphs and plots.
-
initial_experiment: Contains results of all the several different models run during our initial testing
For each of the models in the final and initial experiments directories, we have an experiment.py script for model training and utility scripts for creating graphs.