This repository explores a simple approach to applying a Q Learning algorithm to solve the Traveling Salesman Problem (TSP).
Dependencies are managed using poetry.
The algorithm is tested on a few instances in data
these were downloaded here.
Running poetry run python3 main.py
will run QLearning model on TSP instances and save figure in /figures showing progression in tour length over iterations, and a table summarizing result. No hyper params has been done so far, results are in Results_no_hp_params.txt
.
This is a first model to attempt to solve TSP using Q Learning, the model could be refined, hyper parameters tuned and more research put into defining a good RL-based model. Some tests are implemented, run poetry run pytest
to run tests.
For an instance with 48 cities, the next figures shows the evolution of the cost of a greedy route as the Q table is updated.
Figures suggest several areas of improvements, for example the epsilon decay could be adaptive since we see that tour length sometimes reaches optimal value to increase again because of exploration.
Author : Mehdi Bennaceur