E-VAT: an Asymmetric End-to-End Approach to Visual Active Exploration and Tracking

This repository is the official implementation of E-VAT: an Asymmetric End-to-End Approach to Visual Active Exploration and Tracking.

Description

The development of visual tracking systems is becoming a major goal for the Robotics community. Most of the works dealing with this topic focus exclusively on passive tracking, where the target is confined within the camera’s field of view. Only a minority propose active approaches, but all the methods introduced so far assume that the target is initially in the immediate proximity of the tracker. This represents an undesirable constraint on the applicability of these techniques. Thus, we propose a novel End-to-End Deep Reinforcement Learning based system, capable of both exploring the surrounding environment to find the target and then of tracking it. To do this, we develop a network consisting of two subcomponents: i) the Target-Detection Network, which detects the target in the camera’s field-of-view, and ii) the Exploration and Tracking Network, which employs this information to switch between the exploration policy and the tracking policy with the goal of exploring the environment, finding the target and finally tracking it.

Actor Net

Requirements

To install requirements:

pip install -r requirements.txt

You will also need two others additional libraries:

remind: implementation of a replay buffer.

cd isarlab_libraries\remind   
pip install -e .

isarsocket: sockets used to exchange data between threads.
```
cd isarlab_libraries\isarsocket   
pip install -e .
```

Files

main.py

my_agent.py

model_1CA.py

learner.py

IMPALA.py

utils.py

models

    └── two_tails_1CA.pt

main.py: main file that instantiates the different threads. Hyperparameter values are set in this file.
my_agent.py: implementation of an E-VAT agent (mp.Process).
learner.py: centralized learner (mp.Process).
IMPALA.py: implementation of IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architecture.
utils.py: contains useful functions.
models: folder containing pretrained models.

Usage

After setting the hyperparameters:

python main.py

In order for the framework to work, you need to create your own OpenAI Gym and import it into the utils.py file. The Gym sends data to E-VAT agents and must implement two main methods.

step(action): takes an action from the agent and returns a tuple <state, done, info, reward>.

reset(): reset the environment and returns a tuple <state, info>.

Actions, states, done, info and rewards are defined as follows:

action: action taken by the agent.

type: numpy.ndarray

shape: ()

values: integer ∈ [0, number of actions]

state: E-VAT actor input. First Person View RGB image.

type: numpy.ndarray

shape: (84, 84, 3)

values: real ∈ [0.0, 1.0]

done: signal used to interrupt an episode.

type: boolean

reward: reward as defined in the paper.

type: numpy.ndarray

shape: (1,)

values: real

info: dictionary containing E-VAT critic inputs:

Geo_target: 35 x 35 3D grid (Obstacle Map, Tracker Position and Target Probability Map).

type: numpy.ndarray

shape: (3, 35, 35)

values: real ∈ [0.0, 1.0]

Ego_target: 21 x 11 3D grid representing the egocentric FoV of the tracker.

type: numpy.ndarray

shape: (3, 11, 21)

values: real ∈ [0.0, 1.0]

Tracker_position: coordinates of the tracker within the 35 x 35 grid.

type: numpy.ndarray

shape: (1, 2)

values: integer ∈ [0, 34]

angle: orientation [deg] with respect to the target.

type: numpy.ndarray

shape: ()

values: real ∈ [-180, +180]

distance: distance from the target.

type: numpy.ndarray

shape: ()

values: real

hit: 1 if the target is within the tracker FoV, 0 otherwise.

type: numpy.ndarray

shape: ()

values: integer ∈ [0, 1]

GPS_Yaw: tracker’s orientation [deg] with respect to the global frame.

type: numpy.ndarray

shape: ()

values: real ∈ [-180, +180]

Critic Net

Citing

If you use this framework in a scientific context, please cite the following:

A. Dionigi, A. Devo, L. Guiducci and G. Costante, "E-VAT: An Asymmetric End-to-End Approach to Visual Active Exploration and Tracking" in IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4259-4266, April 2022, DOI: 10.1109/LRA.2022.3150866.

BibTeX details: