Lab 1, Lab Group: Z59, AY2022/23 S2
- Budi Syahiddin
- Faiz Rosli
- Chin Yueh Tao
- gym==0.25.2
- gym[classic_control]
- numpy
- matplotlib
- Anaconda + Python 3.10.9
- Windows 10
Make an agent that can balance the pole on the cart. For more information on the environment, check out OpenAI Gym
Q-Learning. Why? CartPole is a simple problem, does not have many states and there are only 2 possible actions! Also, Q-Learning is very easy to implement from first scratch and does not require a strong computer to train the agent. Computing Q-Table state-action pair can be described using the equation below
Epsilon Greedy. Why? Similar to the previous reason, it is simple to implement.
Q-Learning works well for environment with discrete states and actions.
However, CartPole states are continuous! In order to deal with that,
we will need to make the states discrete. The idea is to split the range
of the states into "bins", i.e. intervals. For example, if range is
Initially, we tested static
The idea we have is, at the beginning, we want the agent to explore more so that it can explore random state spaces. Then as the number of episode increases, we slowly start to exploit, because by then, the table will already be filled with "good" states and we want the agent to exploit those states