/H-AI_collab_game

Collaborative Reinforcement Learning on Human-Computer shared task

Primary LanguagePython

H-AI_collab_game

Collaborative Human- RL agent game

Description

A human-RL agent collaborative game in a graphical environment. This is an extension of this work.

The environment is build in Unity and communicates with the experiment via an HTTP server.

Collaborative Learning is achieved through Deep Reinforcement Learning (DRL). The Soft-Actor Critic (SAC) algorithm is used [2] with modifications for discrete action space [3].

Experiment Set Up

The set-up consists of 3 components:

  1. The Maze Server: Dedicated HTTP server that takes data from the experiment (mazeRL) and passes them to the Unity environment (mazeUnity) and vice versa.
  2. The online version of the MazeRL experiment: Includes the training loops, the RL agent and different configuration files.
  3. The graphical environment MazeUnity: A simulation of the real world experiment from Shafti et al. (2020) [1]

The pipeline to start an experiment is described bellow:

  • Start the dedicated Maze-Server

    • Can be started after MazeRL has started
    • If started before MazeRL has it will wait for it to connect.
    • Receives a configuration file from MazeRL and delivers it to MazeUnity upon startup of the latter.
    • Can be in the same machine with MazeRL (recommended for reduced delay) or in a standalone server (docker instructions in its repo).
  • Start the experiment MazeRL (See Run MazeRL)

  • Open the graphical environment MazeUnity

MazeUnity receives actions (plus other important information) and sends back observations (plus other important information) to MazeRL.

The above messages are being exchanged via the HTTP server (Maze-Server).

MazeRL and MazeUnity work as HTTP clients.

Installation

  • Run source install_dependencies/install.sh.
    • A python virtual environment will be created and the necessary libraries will be installed.
    • Furthermore, the directory of the repo will be added to the PYTHONPATH environmental variable.

Run MazeRL

  • Run python game/sac_maze3d_train.py game/config/<config_sac> <participant_name> for human-agent game.
    • Example:

      python game/sac_maze3d_train.py game/config/config_sac_28K_O-O-a_descending.yaml participant_1
      
    • Notes before training:

      • Set the <participant_name> to the name of the participant.
      • The program will create a /tmp and a /plot folder (if they do not exist) in the results/ folder. The /tmp folder contains CSV files with information of the game. The /plot folder contains figures for tha game. See here for more details.
      • The program will automatically create an identification number after your name on each folder name created

Configuration

Game Configuration

  • In the game/config folder several YAML files exist for the configuration of the experiment. The main parameters are listed below.
    • game:discrete_input: True if the keyboard input is discrete (False for continuous). Details regarding the discrete and continuous human input mode can be found here
    • SAC:reward_function: Type of reward function. Details about the predefined reward functions and how to define a new one can be found here.
    • Experiment:mode: Choose how the game will be terminated; either when a number of games, or a number of interactions is completed.
    • SAC:discrete: Discrete or normal SAC (Currently only the discrete SAC is compatible with the game)

Components Connectivity Configuration

  • game/network_config.yaml contains the information of where to locate each component of the set-up (Maze_RL, Maze-Server and MazeUnity). This is sent upon connection set up to Maze-Server and from Maze-Server consequently to MazeUnity.
    • (if applicable)ip_distributor: the location of an instance of the Maze-Server, that MazeUnity first speaks to, in order to know where to look for Maze-Server. Can be located either on a remote server or locally (same machine as MazeRL). Its purpose is to allow the Maze-Server or MazeRL moving to different locations. If there is no need to be used, MazeUnity can be set up to locate directly Maze-Server
    • maze_server: the location of the maze-server.
    • maze_rl: the location of mazeRL.
  • Set ALL to localhost for playing the game locally. Otherwise, set a static IP (e.g. duckdns.org) and open the appropriate ports on your router.

Play

Directions of how to play the game are given in MazeUnity.

Citation

If you use this repository in your publication please cite below:

Fotios Lygerakis, Maria Dagioglou, and Vangelis Karkaletsis. 2021. Accelerating Human-Agent Collaborative Reinforcement Learning. InThe 14th PErvasive Technologies Related to Assistive Environments Conference (PETRA2021), June 29-July 2, 2021, Corfu, Greece.ACM, New York, NY, USA, 3 pages.https://doi.org/10.1145/3453892.3454004

Experiment Result Output Files

Contents of a/tmp folder.

  • <test/train>_scores.csv The total score for each training game.

  • <test/train>_time_scores.csv The time score ([max game duration] - [game_duration]) for each training game.

  • <test/train>_rewards.csv The cumulative reward for each game.

  • <test/train>_game_durations.csv The total duration game.

  • <test/train>_game_success_rate.csv The success rate ([games that the goal was reached]/[total games played in the session]) for each session (every <update_interval> games).

  • <test/train>_step_durations.csv The duration game_step.

  • <test/train>_steps_per_game.csv The total number of steps per game.

  • <test/train>_logs.pkl A pandas dataframe containing tuples of (sprevious, aagentreal, aagentenvironment, ahuman, s,r)

    • aagentreal: The real agent action predicted [0, 1 or 2]
    • aagentenvironment: The agent action compatible to the environment (0 ->0, 1->1 and 2->-1)
    • config_sac_***.yaml: The configuration file used for this experiment. It's purpose it to be able to replicate this experiment.
  • actor_sac/critic_sac The network weights.

  • <test/train>_distance_travelled.csv The distance travelled by the ball for each game.

  • <test/train>_fps.csv The frames per second that the game was played on the screen.

  • rest_info.csv: internet delay statistics, goal position, total experiment duration, best score achieved, the game that achieved the best score, the best reward achieved, the length of the game trial with the best score, the total amount of time steps for the whole experiment, the total number of games played, the fps the game run on and the average offline gradient update duration over all sessions.

Contents of a/plot folder: .png figures of the logs saved in /tmp folder.

References

[1] Shafti, Ali, et al. "Real-world human-robot collaborative reinforcement learning." arXiv preprint arXiv:2003.01156 (2020).

[2] https://github.com/kengz/SLM-Lab

[3] Christodoulou, Petros. "Soft actor-critic for discrete action settings." arXiv preprint arXiv:1910.07207 (2019).

[4] Fotios Lygerakis, Maria Dagioglou, and Vangelis Karkaletsis. 2021. Accelerating Human-Agent Collaborative Reinforcement Learning. InThe 14th PErvasive Technologies Related to Assistive Environments Conference (PETRA2021), June 29-July 2, 2021, Corfu, Greece.ACM, New York, NY, USA, 3 pages.https://doi.org/10.1145/3453892.3454004