For continuous domain (Mujoco) look at this.
experiment = Experiment() # Add your Comet configs!
usage: main.py [-h] [--world WORLD] [--stage STAGE]
[--total_iterations TOTAL_ITERATIONS] [--interval INTERVAL]
[--do_train] [--render] [--train_from_scratch]
Variable parameters based on the configuration of the machine or user's choice
optional arguments:
-h, --help show this help message and exit
--world WORLD The id number of the mario world.
--stage STAGE The id number of the mario world's stage.
--total_iterations TOTAL_ITERATIONS
The total number of iterations.
--interval INTERVAL The interval specifies how often different parameters
should be saved and printed, counted by iterations.
--do_train The flag determines whether to train the agent or play
with it.
--render The flag determines whether to render each agent or
not.
--train_from_scratch The flag determines whether to train from scratch or
continue previous tries.
- In order to train the agent with your choice of world and stage (
do_train
flag changes the mode of training to testing the agent):
python3 main.py --world=2 --stage=2
- If you want to keep training your previous run, execute the following:
python3 main.py --world=2 --stage=2 --train_from_scratch
- The whole training procedure was done on Quadro P5000 on paperspace.com thus, a machine with similar configuration would be sufficient.
I forgot to document hyper-parameters for all environments on comet.ml. 😅
W-S | T | n_epochs | batch_size | lr | gamma | lambda | ent_coeff | clip_range | n_workers | grad_clip_norm |
---|---|---|---|---|---|---|---|---|---|---|
1-1 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
1-2 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
2-2 | 128 | 10 | 32 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | No Clipping |
3-1 | 128 | 4 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | No Clipping |
3-2 | 128 | 4 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | No Clipping |
3-3 | 128 | 8 | 64 | 1e-4 | 0.9 | 0.95 | 0.01 | 0.1 | 8 | 1 |
3-4 | 128 | 8 | 64 | 1e-4 | 0.9 | 0.95 | 0.01 | 0.1 | 8 | 1 |
4-1 | 128 | 8 | 64 | 1e-4 | 0.9 | 0.95 | 0.01 | 0.1 | 8 | 1 |
4-2 | 128 | 8 | 64 | 1e-4 | 0.95 | 0.95 | 0.01 | 0.2 | 8 | No Clippiing |
4-3 | 128 | 8 | 64 | 2.5e-4 | 0.97 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
5-1 | 128 | 8 | 64 | 2.5e-4 | 0.97 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
5-2 | 128 | 8 | 64 | 2.5e-4 | 0.97 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
5-3 | 128 | 8 | 64 | 2.5e-4 | 0.98 | 0.98 | 0.03 | 0.2 | 8 | 0.5 |
6-1 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
6-2 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
6-3 | 128 | 8 | 64 | 2.5e-4 | 0.98 | 0.98 | 0.03 | 0.2 | 8 | 0.5 |
6-4 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
7-1 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
7-2 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
7-3 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
8-1 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
8-2 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
8-3 | 128 | 8 | 64 | 2.5e-4 | 0.9 | 0.95 | 0.01 | 0.2 | 8 | 0.5 |
- For world 4 - stage 2, the effect of the game score was not used since it distracts Mario from his original goal which is reaching the flag.
- @OpenAI for Mario Wrapper.
- @uvipen for Super-mario-bros-PPO-pytorch.
- @roclark for Mario Reward.