This project employed the DQN algorithm from PARL of Baidu.
All the needed functions were written in the single python file, in which:
-
The most functions, e.g. Model(), Agent(), ReplayMemory(), as well as part of main(), etc. are indentical as or were slightly modified based on the materials provided by the Baidu RL course.
-
The preprocessing (scaling) of the state was inspired by nbuliyang's project
-
The needed libraries and corresponding versions are documented in requirements.txt
The three figures below show the test_reward (mean value of 5 test episodes) and the max_reward (the maximum value among the 5 test episodes):
- at the beginning of the training
- around 3000 episodes
- around 4000 episodes
At the beginning of the experiment:
Around 3000 episodes: