/SAC_GCN

Explainability of Deep RL algorithms using graph networks and layer-wise relevance propagation.

Primary LanguageJupyter Notebook

Explainability of Deep Reinforcement Learning Algorithms in Robotic Domains by using Layer-wise Relevance Propagation

Environments

Our modified versions of robotic environments are under the ./CustomGymEnvs directory. In this directory, there is a changed_envs directory which contains the new FetchReach-v1 environment called FetchReach-v2 with changed action-space. The actions in the updated environment are torques (rather than the x, y, and z velocity of the end-effector). Under the envs directory, there are original environments and environments with occluded entities. Under the faulty_envs, there are environments with blocked joints. Under the graph_envs, there are environments with graph representation of the robots.

Graph Representation

For parsing the robot's xml model and converting the representation into a graph, the RobotGraphModel package has been developed. Under this directory, there is a model_parser.py file which parses the xml model of the environment. The robot_graph.py first parses the model of the robot, identifying the nodes (<body> in the xml) and edges (<joint> in the xml) of the robot. Two nested <body>'s are connected to each other through a <joint> that is defined in the inner body. For each environment, we have developed a class specific to that environment that has inherited from the RobotGraph class withing the robot_graph.py file. Each of these subclasses define the set of node and edge features for a specific environment. Each of these subclasses are used by the OpenAI Gym wrappers under the CustomGymEnvs/graph_envs.

Algorithm

Our algorithm is Soft Actor-Critic. The one with graph representation is under ./Graph_SAC and the original one with fully-connected network is under ./SAC. For using Graph Neural Network architecture, we use the implementation of torchgraph developed for the paper: Explainability Techniques for Graph Convolutional Networks. For the LRP implementation, we use this repository developed for the same paper.

Installation and Usage Guidelines

Setup

The python version is 3.8.10. The first step before running the project is to install MuJoCo 2.1:

$ wget https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz
$ tar -xvf mujoco210-linux-x86_64.tar.gz
$ mv mujoco210 ~/.mujoco/
$ pip3 install -U 'mujoco-py<2.2,>=2.1'

Download the project file into the $HOME/Documents folder. Then add the following lines to the ~/.bashrc file:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
export PYTHONPATH=$PYTHONPATH:$HOME/Documents/SAC_GCN

Then install the requirements of the project:

$ pip3 install -r requirements.txt

Experiments

First Phase

To run the experiments with graph representation of the robot, run the following command:

$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type graph

where the MAIN_FILE is the absolute path to the /Controller/graph/main.py file. For a complete set of arguments, please check out the main.py file. The ENV-NAME can be the following names:

  • FetchReach-v2
  • Walker2d-v2
  • HalfCheetah-v2
  • Hopper-v2

After training the agent using graph networks, the Layer-wise Relevance Propagation (LRP) is applied to highlight the contribution of each part of the robot to the decision making. The data for experiments are saved under ./Data/{ENV-NAME}/graph.

After the convergence of the policy, the LRP is applied to the learned policy to calculate the relevance scores given by each action to each entity across time-steps. To run LRP for the ENV-NAME environment, run the following:

$ python $EVALUATE --env-name {ENV-NAME} --exp-type graph

where EVALUATE is the absolute path to the ./Evaluate/evaluate.py file. The result of running this file would be stored under ./Data/{ENV-NAME}/graph/edge_relevance.pkl and ./Data/{ENV-NAME}/graph/global_relevance.pkl, which contains the relevance scores given to edge and global units of the input graph, respectively.

Second Phase

In this phase, the results of the first phase are evaluated by either the following experiments:

  1. Occluding the entity's features in the observation space, which validates its relevance score.
  2. Blocking the joint which validates the importance of each joint in the action space.

In each of the above, based on the amount of drop in their performance, their relevance scores are validated. For more information, please refer to the paper. For all the following commands, $MAIN_FILE is the absolute path to the ./Controller/basic/main.py file. For running experiments in the standard setting, just run the following:

$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type standard

To run experiments for the occlusion case, use the following command:

$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type {ENTITY-NAME}

where ENTITY-NAME is the name of the entity we want to occlude. For each environment, the list of the ENTITY-NAMEs' are appeared in the following:

  • FetchReach-v2
    • goal
    • shoulder_pan_joint
    • shoulder_lift_joint
    • upperarm_roll_joint
    • wrist_flex_joint
    • forearm_roll_joint
    • wrist_roll_joint
    • elbow_flex_joint
  • Walker2d-v2
    • torso
    • foot_joint
    • leg_joint
    • thigh_joint
    • foot_left_joint
    • leg_left_joint
    • thigh_left_joint
  • HalfCheetah-v2
    • torso
    • bfoot
    • bshin
    • bthigh
    • ffoot
    • fshin
    • fthigh
  • Hopper-v2
    • torso
    • foot_joint
    • leg_joint
    • thigh_joint

For running experiments for the blockage case, use the following command:

$ python $MAIN_FILE --env-name {BROKEN-ENV-NAME} --exp-type {JOINT-NAME}

where BROKEN-ENV-NAME is the name of the environment with broken joint, as appeared in the following list:

  • FetchReachBroken-v2
  • Walker2dBroken-v2
  • HalfCheetahBroken-v2
  • HopperBroken-v2

and JOINT-NAME is the name of the joint we want to block. For each environment, the list of the JOINT-NAMEs' are appeared in the following:

  • FetchReachBroken-v2
    • shoulder_pan_joint
    • shoulder_lift_joint
    • upperarm_roll_joint
    • wrist_flex_joint
    • forearm_roll_joint
    • wrist_roll_joint
    • elbow_flex_joint
  • Walker2dBroken-v2
    • foot_joint
    • leg_joint
    • thigh_joint
    • foot_left_joint
    • leg_left_joint
    • thigh_left_joint
  • HalfCheetahBroken-v2
    • bfoot
    • bshin
    • bthigh
    • ffoot
    • fshin
    • fthigh
  • HopperBroken-v2
    • foot_joint
    • leg_joint
    • thigh_joint

Note that these experiments use the original SAC algorithm with fully-connected networks under the ./SAC directory. For each environment, the resulting data is stored under the following directories:

  • For the occlusion case: ./Data/{ENV-NAME}/{ENTITY-NAME}
  • For the blockage case: ./Data/{BROKEN-ENV-NAME}/{JOINT-NAME}

Plots

To plot the results of the experiments, run the following code:

$ python $PLOT --env-name {ENV-NAME}

where $PLOT is the absolute path to the ./Plots/plot.py file. The result would be stored under ./Result/{ENV-NAME}.jpg.

For further information about the method and results, please refer to our paper:

@article{taghian2024explainability,
  title={Explainability of deep reinforcement learning algorithms in robotic domains by using Layer-wise Relevance Propagation},
  author={Taghian, Mehran and Miwa, Shotaro and Mitsuka, Yoshihiro and G{\"u}nther, Johannes and Golestan, Shadan and Zaiane, Osmar},
  journal={Engineering Applications of Artificial Intelligence},
  volume={137},
  pages={109131},
  year={2024},
  publisher={Elsevier}
}