AIRO project. Elective in Artificial Intelligence course: Reasoning Agents Università La Sapienza Roma
Reasoning Agents project: Reinforcement Learning and Restraining Bolts with LTL specifications
ENV: chessboard: 5 colors ('green','blue','purple','black','grey'), 4 visits for each color; grid 5x7.
RL: learn the chess moves: Knight, King, Rock, Bishop, Queen with SARSA learning algorithm
RB specification: perform moves in the specified order (NB: order for the subject, i.e. first the Knight, then the King ...) (NB: each move is not random, i.e. start from 1,1 then goes to 1,2 ... the Knight moves from the bottom to the top...)
We have only one agent that must learn and perform 5 different Chess Moves following a particular sequence. The chessboard is characterized by 5 colors, each one corresponding to one particular move; for each move only four squares are available and must not be random generations. For example the Knight move must move from the bottom to the top, by following the classic L letter. The goal of the game is to find the maximum score: 20 points.
The video of the experiment is shown below.
Pick And Place Robot (future work)
The environment is completed, but there is only the link with RL part: the Linear Temporal Logic is only sketched (so the Restraining Bolts specifications are not implemented)
RL + RB: the robot is fixed on yellow square: its end effector moves around the 3x3 grid, it must learn to take the current item from the green square each time and bring it on each red shelf, by following the order RB specification (not randomly)
In science fiction (as in the Star Wars movie) Restraining bolts were small, cylindrical devices that could be affixed to a droid in order to limit its functions and enforce its obedience. When inserted, a restraining bolt restricted the droid from any movement its master did not desire, and also forced it to respond to signals produced by a hand-held control unit. Some droids felt sheer horror at the mere mention of restraining bolts.
You can see our final slide presentation about this project in Documents folder
Here there is also the main reference paper of our work.
$ python game.py Chess4 Sarsa new_trainfile
$ python plotresults.py -datafiles data/new_training
Main reference Paper, Università La Sapienza Roma
RL_GAMES: Iocchi,De giacomo, Patrizi, Università La Sapienza Roma
Non markovian Rewards expressed in LTL