Graphs for Q Learning and Async Q Learning
With Anaconda: conda env -f environment.yml
python src/q_learning.py
Options:
--episodes INTEGER Number of Successful Runs.
-e, --epsilon FLOAT Probability of choosing the next action randomly (vs.
greedily).
-a, --alpha FLOAT Learning Rate.
-g, --gamma FLOAT Discount Factor.
--help Show this message and exit.
python src/async_qlearning.py
Options:
-n, --processes INTEGER Number of Processes to run.
-e, --epsilon FLOAT Probability of choosing the next action randomly
(vs. greedily).
-a, --alpha FLOAT Learning Rate.
-g, --gamma FLOAT Discount Factor.
-u, --update INTEGER Number of steps until update.
-s, --steps INTEGER Total number of steps.
--help Show this message and exit.
pytest src/test
Part 2 was implemented using multiple processes.
I didn't have time to complete Part 3, so instead I talked to Dmitry Bobrenko, who wrote a multiprocess Python implementation of the Asynchronous DQN. I made some minor code improvements while I read his code.