This project is all about creating a bot that is able to solve different configurations of the 3x3 sliding tiles puzzle. Reinforcement learning, specifically Q-learning, was used to create this bot. The bot was trained for about 12 hours and 30 million episodes. Currently, the bot is able to solve most given configurations but fails sometimes(this however, can be fixed with more training).
An engine was created that simulates the game itself. A transition matrix was created wherein for each state, the possible new states were detailed given each action. The engine is found in SlidingPuzzleEngine.py
The environment inherits from OpenAi gym's Environment class. This is further detailed in SlidingPuzzleEnv.py
A score per state was assigned based on the sum of powers of each of the numbers. Reward per move was calculated as:
Here are the training parameters
- learning rate = 0.1
- max epsilon = 1.0
- min epsilon = 0.01
- epsilon decay = 0.0005
- gamma = 0.95 Training took about 12 hours and 30 million episodes.
Only preliminary evaluation has been done since this is a WIP, but the agent can solve most configurations so far.
Much of this work is patterned from this repository