This projects focuses on a player agent control approach realized with the technology: "Deep Q Learning". The environment is a squared place with yellow and blue bananes distributed over the place.
The agents goal is to collect the yellow bananas (reward +1) and to avoid blue bananes (reward -1). The state space is a vector with 37 entries describing the velocity, direction and the environment. The agent is able to choose between 4 options: forward, left, right and backwards. The goal is to find a control strategy, which maximizes the total average return.
- clone the project and open the Navigation.ipynb notebook.
Follow the instructions in Navigation.ipynb
to get started with training your own agent!
Consider, that there are 3 pretrained models which can be chosen without running the (time-consuming) training process to demonstrate the functionality. Find the corresponding section in the (4. Validate the functionality) chapter.