- python 3.6.8 (Anaconda)
- pytorch 1.0.0
- gym 0.12.1
- Easy easy. Install the dependencies and run the below command.
python main.py
- Red: A2C with ICM, Blue: A2C w/o ICM
- A2C w/ ICM seems to converge slightly faster than the other on average in my experiments.
- I trained the model in CartPole environment. However, it is not the best choice for experiment of curiosity
- I modified overall model architecture.
- A2C instead of A3C (Just Actor Critic using Advantage, not parallel technique).
- Very simple inverse model and forward model. Because the observation of CartPole is already some feature representations, not image.
- Larger scaling factor of intrinsic rewards.