/rl-utils

Primary LanguagePythonMIT LicenseMIT

RL Job Launcher and Plotting Utilities

Installation

Requires Python >= 3.7.

  • pip install rl-exp-utils
  • To install from source: git clone https://github.com/ASzot/rl-utils.git && cd rl-utils && git install -e .

Experiment Launcher

Utility script to launch jobs on Slurm (either via sbatch or srun), in a new tmux window, with PyTorch distributed, or in the current shell.

Examples:

  • Launch job in current window using ~/configs/proj.yaml config: python -m rl_utils.launcher --cfg ~/configs/proj.yaml python imitation_learning/run.py
  • Launch job on user-overcap partition: python -m rl_utils.launcher --partition user-overcap --cfg ~/configs/proj.yaml python imitation_learning/run.py
  • Evaluate the last checkpoint from a group of runs: python -m rl_utils.launcher --cfg ~/configs/proj.yaml --proj-dat eval --slurm small python imitation_learning/eval.py load_checkpoint="&last_model WHERE group=Ef6a88c4f&"

Arguments

  • --pt-proc: Run with torchrun using --pt-proc per node.
  • --cd: Sets the CUDA_VISIBLE_DEVICES.
  • --sess-name: tmux session name to attach to (by default none).
  • --sess-id: tmux session ID to attach to (by default none).
  • --group-id: Add a group prefix to the runs.
  • --run-single: Run several commands sequentially.
  • --time-freq X: Run with pyspy timing at frequency X. Will save the profile to data/profile/scope.speedscope.

Slurm arguments:

  • --g: Number of SLURM GPUs.
  • --c: Number of SLURM CPUs.
  • --comment: Comment to leave on SLURM run.

Run Exp Config Schema

Keys in config file.

  • add_all: str: Suffix that is added to every command.
  • ckpt_cfg_key: The key to get the checkpoint folder from the config.
  • ckpt_append_name: If True, the run name is appended to the checkpoint folder.
  • slurm_ignore_nodes: List[str]: List of Slurm hosts that should be ignored.
  • proj_dat_add_env_vars: Dict[str, str]: Mapping --proj-dat key to environment variables to export. Multiple environment variables are separated by spaces.
  • eval_sys: Configuration for the evaluation system. More information on this below.

Variables that are automatically substituted into the commands:

  • $GROUP_ID: A unique generated ID assigned to all runs from the command.
  • $SLURM_ID: The slurm job name. Randomly generated for every run. This is generated whether the job is running on slurm or not.
  • $DATA_DIR: base_data_dir in the config
  • $CMD_RANK: The index of the command in the list of commands to run.
  • $PROJECT_NAME: proj_name from config.
  • $WB_ENTITY: wb_entity from config.

Example:

base_data_dir: ""
proj_name: ""
wb_entity: ""
ckpt_cfg_key: "CHECKPOINT_FOLDER"
ckpt_append_name: False
add_env_vars:
  - "MY_ENV_VAR=env_var_value"
conda_env: "conda_env_name"
slurm_ignore_nodes: ["node_name"]
add_all: "ARG arg_value"
eval_sys:
  ckpt_load_k: "the argument name to pass the evaluation checkpoint directory to"
  ckpt_search_dir: "folder name relative to base data dir where checkpoints are saved."
  change_vals:
    "arg name": "new arg value"
proj_data:
  option: "ARG arg_value"
slurm:
  profile_name:
    c: 7
    partition: 'partition_name'
    constraint: 'a40'

Auto Evaluation System

Automatically evaluate experiments from the train job slurm launch script. Example usage: python -m rl_utils.launcher.eval_sys --runs th_im_single_Ja921cfd5 --proj-dat render. The eval_sys config key in the project config specifies how to change the launch command for evaluation (like loading a checkpoint or changing to an evaluation mode).

Plotting

Auto Line

Run something like python -m rl_utils.plotting.auto_line --cfg plot_cfgs/my_plot.yaml where plot_cfgs/my_plot.yaml looks something like:

methods:
  "dense_reward": "Ud05e1467"
  "sparse": "W5c609da1"
  "mirl": "bf69a9e1"
method_spec: "group"
proj_cfg: "/Users/andrewszot/configs/mbirlo.yaml"
plot_key: "dist_to_goal"
save_name: "reward"
use_cached: True
plot_params:
  smooth_factor: 0.7
  legend: True
  rename_map:
    "dense_reward": "Dense Reward"
    "sparse": "Sparse Reward"
    "_step": "Step"
    "dist_to_goal": "Distance To Goal"
    "mirl": "Meta-IRL"

Argument description:

  • method_spec: Key to group methods by. For example: group name.
  • save_name: No extension and no parent directory.
  • methods: Dict of names mapping to method_spec instances (for example group names).

Auto Bar

Config Schema:

methods:
  st_pop: K94569d43
  im_pop: Rb2cd0028
method_spec: "group"
proj_cfg: "../../configs/hr.yaml"
plot_key: "eval_reward/average_reward"
save_name: "set_table"
use_cached: False
plot_params: {}

W&B Query

Selectable fields:

  • summary: The metrics for the model at the end of training. Also the run state. Useful if you want to check run result.
  • Any other key: If the key is none of the above, then it will get the relevant key from the summary dict in the run (the final value).

Environments

Toy environments to test algorithms.