Primary LanguageC++MIT LicenseMIT

Collective Visual Attention Clone via Neural Interaction Graph Prediction

This repository contains code for our paper "Collective Visual Attention Clone via Neural Interaction Graph Prediction".



The code runs on Ubuntu 20.04 with ros noetic.


The robot uses a Jetson Xavier NX with JetPack 5.1.3.

Setup the Environment

To setup the offline python environment for model training and offline running:

conda env create -f environment.yaml
conda activate vas

Data Preparation

The data and checkpoints can be downloaded here:
password: pk34
We provide ros bags in Gazebo and real-world environment.

Build CPP ROS Nodes

Some nodes are implemented in C++, so go to the vas_ws to build them:

cd vas_ws

Offline Running

Before running offline, make sure that the required Conda environment is properly set up. Once configured, download the check points and unzip it in the root folder.

unzip checkpoints.zip -d ./

then you can publish the data and start the all-in-one script:

rosbag play PATH_TO_THE_BAG
bash script/off_start.sh

If you prefer to run each module separately, refer to the respective modules in the off_start.sh script. The host_id parameter in the image_concatenator node corresponds to the ID of each robot. For data in Gazebo simulations, set host_id=999.

The offline running code demonstrates the overall workflow, including visual detection results and velocity commands. In real-world or Gazebo simulation experiments, the robot will respond to the velocity commands and move to exhibit collective behavior.

Onboard Running

To run the code on a real robot, all network models must be converted to TensorRT in advance. For tutorials on TensorRT and Triton, please refer to Nvidia's official resources: TensorRT and Triton. These details will not be covered in this project.

To execute the code onboard, first start the Triton server. Then, modify the specified lines in off_start and run the script.

python3 ros_nodes/ob_detection.py  ->  python3 ros_nodes/onboard_detection.py


Visual Attention

The training and testing codes are in va_model folder.

cd va_model
conda activate vas
python3 train_vatt.py


We use the official code to train the YOLOv5. The input channel is modified to 1 for computational efficiency. The yolo_test folder contains the code borrowed from YOLOv5 for offline running, onnx conversion and post-process.

Neural Interaction Graph Prediction

The training and inference code of our GVAE model is in gvae folder. Use the scripts in run_scripts to train the model. A training example data is also provided in the Data Preparation. You can unzip and copy it to the gvae folder.


  • Our GVAE takes graph network operators in PyTorch Geometric.
  • Our baselines of interaction graph prediction comparison are from GST, NRI,IMMA,dNRI. The graph attention network (GAT) used in the comparison takes the implementation in IMMA. Our model also refer to the above implementations. We extend our gratitude to the authors of the above projects for their open-source contributions.
  • Our json loading in JSON is borrowed from json for modern C++.
  • Our YOLOv5 running code is adapted from the official code of YOLOv5.
  • The onboard code utilizes the acceleration and scheduling of TensorRT and Triton.
  • The robot swarm system for simulation and real-world experiment is based on our previous work. More details can be found at Omnibot.


This project is licensed under the MIT License.
If you have any questions, please contact likai [at] westlake [dot] edu [dot] cn