PPO+ICM

This is an implementation of intrinsic curiosity module (pathak et al, ICML 2017).
This doc includes test curves, ICM module usage, and instructions to run the experiments.

Tests

Pyramid env, Unity ML

Agent Reward Function (independent):

+2 For moving to golden brick
-0.001 per step

PushBlock env, Unity ML

Agent Reward Function:

+5.0 if the block touches the goal
-0.0025 for every step.

ICM Module Usage

Located in icm.py

initialize module within ppo agent
class Agent():
def __init__():
self.icm = ICM(state_size, action_size)
compute intrinsic reward when interacting with environment
intrinsic_reward = agent.icm.compute_intrinsic_reward(states, next_states, actions)
train ICM when training PPO
self.icm.train(state_samples, next_state_samples, action_samples)

Running Experiments

git clone https://github.com/bonniesjli/icm.git
cd icm
cd envs
pip install -e .

to run the pyramid experiment
python -m main_pyramid
to run the pushblock experiment
python -m main_pushblock