https://github.com/VivianoRiccardo/OpenAI-custom-static-environment-policy-iteration
https://github.com/openai/gym/blob/master/docs/creating-environments.md
Run from this readme folder
pip install -e gym-staticinvader
gym.make('gym_staticinvader:staticinvader-v0')
6 X 4 Grid:
4 := enemy invaders (cyka)
3 := our space ship
1 := laser beam
0 := empty square
7 := wall
In the initial state we have the first 2 rows filled with 2 enemies per row the column where they stay is randomly chosen. Then we have in the middle row the 2 random walls and at the center of the last row our spaceship.
Only the actions of the agent can change the environment!
1 := left
2 := right
3 := shoot
The game has been handled as a markov decision process with a transition function that maps the next state s' with probability P(s' | s,a) = 1 to simplify the computations, so in our case i decided to have only one possible outcome with probability 1 given a state s and an action a, however the code has been scripted to handle a distribution probability among different states
Leave the policy iteration algorithm run (about 5 minutes to complete) and you will see the agent playing the game according to the optimal policy he found during the policy iteration algorithm, When there are no 4 (invaders) the agent achieved its goal!