Restraining-Bolts-for-Reinforcement-Learning-using-Linear-Temporal-Logic

AIRO project. Elective in Artificial Intelligence course: Reasoning Agents Università La Sapienza Roma

Approach (First Example)

Reasoning Agents project: Reinforcement Learning and Restraining Bolts with LTL specifications

ENV: chessboard: 5 colors ('green','blue','purple','black','grey'), 4 visits for each color; grid 5x7.

RL: learn the chess moves: Knight, King, Rock, Bishop, Queen with SARSA learning algorithm

RB specification: perform moves in the specified order (NB: order for the subject, i.e. first the Knight, then the King ...) (NB: each move is not random, i.e. start from 1,1 then goes to 1,2 ... the Knight moves from the bottom to the top...)

Description of the Chess Game

We have only one agent that must learn and perform 5 different Chess Moves following a particular sequence. The chessboard is characterized by 5 colors, each one corresponding to one particular move; for each move only four squares are available and must not be random generations. For example the Knight move must move from the bottom to the top, by following the classic L letter. The goal of the game is to find the maximum score: 20 points.

The video of the experiment is shown below.

Approach (Second Example) (abstract)

Pick And Place Robot (future work)

The environment is completed, but there is only the link with RL part: the Linear Temporal Logic is only sketched (so the Restraining Bolts specifications are not implemented)

RL + RB: the robot is fixed on yellow square: its end effector moves around the 3x3 grid, it must learn to take the current item from the green square each time and bring it on each red shelf, by following the order RB specification (not randomly)

Restraining Bolts

In science fiction (as in the Star Wars movie) Restraining bolts were small, cylindrical devices that could be affixed to a droid in order to limit its functions and enforce its obedience. When inserted, a restraining bolt restricted the droid from any movement its master did not desire, and also forced it to respond to signals produced by a hand-held control unit. Some droids felt sheer horror at the mere mention of restraining bolts.

Team

  • Flavio Lorenzi

  • Nicolò Mantovani

  • Sara Tozzo

  • Giorgia Piernoli

Documentation

You can see our final slide presentation about this project in Documents folder

Here there is also the main reference paper of our work.

Training the Chess Game (exp 1)

$ python game.py Chess4 Sarsa new_trainfile

Plot the results

$ python plotresults.py -datafiles data/new_training

References

Main reference Paper, Università La Sapienza Roma

RL_GAMES: Iocchi,De giacomo, Patrizi, Università La Sapienza Roma

Non markovian Rewards expressed in LTL

Video with best learned policy

SC2 Video