Curiosity-Driven Exploration - pytorch implementation w/ CartPole (Simple version)

Dependencies

	python main.py

Red: A2C with ICM, Blue: A2C w/o ICM
A2C w/ ICM seems to converge slightly faster than the other on average in my experiments.

I trained the model in CartPole environment. However, it is not the best choice for experiment of curiosity
I modified overall model architecture.
- A2C instead of A3C (Just Actor Critic using Advantage, not parallel technique).
- Very simple inverse model and forward model. Because the observation of CartPole is already some feature representations, not image.
- Larger scaling factor of intrinsic rewards.