environment: n-chain
- HDQN
- Vanilla DQN
- Bootstrapped_DQN with 10 heads (seed 32) nchain
- BoostrappedDQN Readme file 보고 baseline 설치. tensorflow 버전에 유의(1.15 필요)
- 수업시간에 배운 NoisyDQN 및 BayesBackpropDQN, MNFDQN 비교 가능
- nchain with 100 states (starting from 1, s2)
- score 10 reached at episode 5 and forgets but soon returns (slower)
- DQN first reaches 10 at episode 200 forgets at 900
- UCB
- application of UCB using mean and std of DQN heads
- TDU
- uncertainty estimation using temporal difference uncertainty
python3 -m venv rl
source rl/bin/activate
cd RL_project/BootstrappedDQN
mkdir graphs
mkdir graphs/mean graphs/std
cd ..
python3 -m qlearn.toys.main_nchain --agent BootstrappedDQN --cuda 0 --input-dim 20 --double-q 0 --ucb 1 --max-episodes 2500 —seed 1