Scan-path Prediction on 360◦ Images using Saliency Volumes


Marc Assens	Kevin McGuinness	Xavier Giro-i-Nieto	Noel O'Connor

A joint collaboration between:


Insight Centre for Data Analytics	Dublin City University (DCU)	UPC Image Processing Group

Abstract

We introduce a deep neural network for scan-path predic- tion trained on 360◦ images, and a temporal-aware novel representation of saliency information named saliency vol- ume. The first stage of the network consits of a model trained to generate saliency volumes, whose weights are learned by back-propagation computed from a binary cross en- tropy (BCE) loss over downsampled versions of the saliency volumes. Sampling strategies are used to generate scan- paths from saliency volumes. Our experiments show the advantages of using saliency volumes, and how they can be used for related tasks.

Publication

Find the pre-print version of our work on arXiv.

Please cite with the following Bibtex code:

@InProceedings{Reina_2017_ICCV_Workshops,
author = {Assens Reina, Marc and Giró-i-Nieto, Xavier and McGuinness, Kevin and O'Connor, Noel E.},
title = {SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes},
booktitle = {ICCV Workshop on Egocentric Perception, Interaction and Computing},
month = {Oct},
year = {2017}
}

You may also want to refer to our publication with the more human-friendly Chicago style:

Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCV Workshop on Egocentric Perception, Interaction and Computing. 2017.

Slides

SaltiNet: The Temporal Dimension of Visual Attention Models from Xavier Giro-i-Nieto

Models

The scan-path generator presented in our work can be downloaded from the links provided below the figure:

Model Architecture:

[Scan-path generator model (100 MB)]

Saliency volumes

Saliency volumes aim to be a suitable representaiton of spatial and temporal saliency information for images. They have three axis that represent width and height of the image, and time. They are a meta representation of saliency information and other saliency representations can be extracted from them. Saliency maps can be generated performing an addition operation of all the temporal slices of the volume, and normalizing the values to ensure they add to one. A simillar representaiton are temporally wheigted saliency maps, which are generated by performing a weighted addition operation of all the temporal slices. Finally, scan-paths can also be extracted by sampling fixation points from the temporal slices.

Datasets

Training

As explained in our paper, our networks were trained on the training and validation data provided by 360 Salient Challenge.

Software frameworks: Keras

The model is implemented in Keras, which at its time is developed over Theano.

pip install -r https://github.com/massens/saliency-360salient-2017/blob/master/requirements.txt

Acknowledgements

We would like to especially thank Albert Gil Moreno from our technical support team at the Image Processing Group at the UPC.


Albert Gil


We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeoForce GTX Titan X used in this work.
The Image ProcessingGroup at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office.
This work has been developed in the framework of the projects BigGraph TEC2013-43935-R and Malegra TEC2016-75976-R, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF).
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under grant number SFI/12/RC/2289.

Contact

If you have any general doubt about our work or code which may be of interest for other researchers, please use the public issues section on this github repo. Alternatively, drop us an e-mail at mailto:xavier.giro@upc.edu. pat

BUAAtao/saliency-360salient-2017