Algorithms in this project were implemented with help from Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
The lunar lander has two types of action spaces, discrete and continuous.
Discrete
action | description |
---|---|
Do nothing | |
Fire left engine | |
Fire main engine | |
Fire right engine |
state | description |
---|---|
x-axis coord of agent | |
y-axis coord of agent | |
x-axis linear velocity | |
y-axis linear velocity | |
Agent's angle | |
Agent's angular velocity | |
Right leg touched ground | |
Left leg touched ground |
An episode is considered a solution if it scores at least 200 points.
points | condition |
---|---|
+/- | Agent's distance to landing pad |
+/- | Agent's speed |
| - | Agent's tilt (angle not horizontal) | | +10 | For each leg that contacts ground | | -0.03 | For each frame that a side engine fires | | -0.3 | For each frame that main engine fires | | +/- 100 | For crashing/landing safely |
- Lander crashes
- Lander exits viewport (x-coord is gt 1)
- Lander is not awake