/rl_lesson

Primary LanguageJupyter Notebook

I am not that talented myself at MLops (or ML theory, or anything else in ML). But I know one person who has great MLops: Costa Huang, developer of CleanRL, the only RL library in existence that is both friendly to new users and academic reaserch.

Installation

Install CleanRL:

pip install poetry
pip install wandb
git clone https://github.com/vwxyzjn/cleanrl.git
cd cleanrl && poetry install && cd ..

Next, if you are interested in experiment tracking (which you should be), create an account on Wandb https://wandb.ai/

And then log into wandb:

wandb login

Running

Now, we can get started with the RL:

python cleanrl/cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000 

To see your logs with tensorboard, you can run

tensorboard --logdir runs

To set up video capture and log everything with wandb, do:

xvfb-run -s "-screen 0 1024x768x24" python cleanrl/cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000 \
    --wandb-project-name rl_lesson \
    --track \
     --capture-video

Something harder

Lunar lander:

Install environment

pip install gym[box2d]
xvfb-run -s "-screen 0 1024x768x24" python cleanrl/cleanrl/ppo.py \
    --seed 1 \
    --env-id LunarLander-v2 \
    --num-envs 16 \
    --track \
    --total-timesteps 1000000  \
    --capture-video

Note that it doesn't perform very well, even after 1,000,000 steps.