This repository provides database, code and results visualization for reproducing all the reported results in the paper:
- Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach. Yuhang Song †, Mai Xu †∗, Jianyi Wang, Minglang Qiao, Liangyu Huo, Zulin Wang.
Published on IEEE Transactions on Pattern Analysis and Machine Intelligence in 2018. By MC2 Lab @ Beihang University.
Specifically, this repository includes guidelines to:
- Download the PVS-HMEM database.
- Setup a environment to run our code.
- Reproduce visualized results from the paper.
If you find our paper, database or code useful for your research, please cite:
@article{xu2018predicting,
title={Predicting head movement in panoramic video: A deep reinforcement learning approach},
author={Xu, Mai and Song, Yuhang and Wang, Jianyi and Qiao, MingLang and Huo, Liangyu and Wang, Zulin},
journal={IEEE transactions on pattern analysis and machine intelligence},
year={2018},
publisher={IEEE}
}
Our PVS-HMEM (Panoramic Video Sequences with Head Movement & Eye Movement database) database contains both Head Movement and Eye Movement data of 58 subjects on 76 panoramic videos.
- Blue dots represent the Head Movement.
- Translucent blue circles represent the FoV.
- Red dots represent the Eye Movement.
Download our PVS-HM database from DropBox. Please feel free to contact us by clicking here so that we can give you access permission to the file. Then extract it with:
tar -xzvf PVS-HM.tar.gz
Note that it contains all MP4 files of our database, along with the HM scanpath data FULLdata_per_video_frame.mat
.
Note that the EM data is not provided here due to the scope of this work, however we release the EM data to facilitate the community in this repo.
For more details of the FULLdata_per_video_frame.mat
file, refer to here (Note that you do not have to read the details of the mat file if you just want to run our code and reproduce the numbers).
If you are not familiar with things in this section, refer to my personal basic setup for some guidelines or simply google it.
- Install go-vncdriver from OpenAI;
- Install FFMPEG by
sudo apt-get install ffmpeg
; - Instell OpenCV 2.4, you can use the script here;
Install Anaconda according to the guidelines on their official site, then install other requirements with command lines:
sudo apt-get install tmux
source ~/.bashrc
# create env
conda create -n dhp_env python=2.7
# active env
source activate dhp_env
# install packages
pip install gym tensorflow universe
conda install matplotlib
pip install opencv-contrib-python
conda install -c conda-forge imageio
# clone project
git clone https://github.com/YuhangSong/DHP-TensorFlow.git
# make remap excuatble
cd DHP-TensorFlow
chmod +x ./remap
Now you should run ./remap
to make sure the remap is excuatble.
It will log information as follows:
./remap [-i input] [-o output] [-f filter] [-m m] [-n n] [-w w] [-h h] [-t tf] [-y] src dst
-i ... Input file type: cube, rect, eqar, merc [rect]
-o ... Output file type: cube, rect, eqar, merc, view [rect]
-f ... Filter type: nearest, linear, bicubic [bicubic]
-m ... Input height list [500]
-b ... Input width [2m]
-n ... Output height [500]
-v ... Output width [2n]
-w ... Viewport width [200]
-h ... Viewport height [200]
-x ... Viewport fov x in degree [90]
-y ... Viewport fov y in degree [90]
-p ... Viewport center position phi (latitude)(degrees) [0]
-l ... Viewport center position tht (longitude)(degrees) [0]
-t ... Tracking data file [none]
-y ... Blend data together (only works with orec, etc ...) [off]
-z ... Number of frames [MAX]
-s ... Number of the start frame [0]
If it is working, check a corresponding issue
- Note that the remap is provided by mattcyu1, we thank a lot for his contribution to the community.
Please make sure you have:
- More than 64 GB of RAM.
- More then 600 GB space on the disk you store PVS-HM database.
This section clarifies procedures to train and test offline-DHP.
Set the database_path
in config.py
to your database folder.
Generate YUV files. Set mode = 'data_processor'
and data_processor_id = 'mp4_to_yuv'
in config.py
and run:
source ~/.bashrc
source activate dhp_env
python train.py
The converted YUV files will take about 600 Gb. The reason we have to use YUV files is that, the remap function that get FoV from a 360 image is a binary file that takes YUV and output YUV. We have developed a Python version of remap, but it turns out to be even slower than just reading and writing YUV files into the disk (for more then 5 times). We are trying to see if remap is important to produce our results.
Note that train.py
is a script that starts multiple process managed by tmux.
Thus, after running train.py
, you can use tmux attach=session -t a3c
to see how each process goes.
More about tmux can be found here.
Warning: After you run train.py, it will start a tmux session, where the program runs. If you are using mode = 'data_processor', please make sure each window exit normally and your task is complete without any error by navigating to each window of the tmux session.
Generate groundtruth heatmaps. Set mode = 'data_processor'
and data_processor_id = 'generate_groundtruth_heatmaps'
in config.py
and run:
source ~/.bashrc
source activate dhp_env
python train.py
Set mode = 'off_line'
, procedure = 'train'
and if_log_results = False
in config.py
, run following:
source ~/.bashrc
source activate dhp_env
python train.py
During the first few episode, you may find the CPU usage is extremely low, this is due to the sub-process is competing on remap function, which exchange data with disk. Later on, the CPU usage will increase.
Note that we trained for number_trained_steps = 1.113 * (10^6)
to produce our results in the paper, we later found that training too much (10 times as many as 1.113 * (10^6)
) may make the agent converge to FCB.
Note that the model is stored and restored automatically.
Thus, as long as you did not change the log_dir
in config.py
, previous trained model will be restored.
Set mode = 'off_line'
, procedure = 'test'
and if_log_results = True
in config.py
, then run following:
source ~/.bashrc
source activate dhp_env
python train.py
The code will generate and store predicted_heatmaps, predicted_scanpath and CC value.
If you are seeing
Starting training at step=<your-previous-global-step>
then the model is restored successfully. If you are seeing
Starting training at step=0
then you have not restored it successfully, refer to a corresponding issue
For results under more evaluation protocol. You may want to generate and store groundtruth_scanpaths with mode = 'data_processor'
and data_processor_id = 'generate_groundtruth_scanpaths'
.
To load our trained model, download our model from DropBox link, extract it to the path ../results/
, and set log_dir = "../results/reproduce_17"
.
As has been said, the model in the log_dir
will be automatically loaded.
The code log multiple curves to help analysis the training process, type:
tensorboard --logdir <PATH>
where <PATH>
is the log_dir
in config.py
.
mode = 'data_processor'
is a efficient way to process data under our TMUX manager, the code is inenv_li.py
.- Some features we used in TensorFlow will be depreciated in a future version, we are using
tf.__version__=1.6.0
to run our code. - Reinforcement Learning based methods are inherently stochastic, and we cannot guarantee producing exact the same numbers as those reported in our DHP paper. But if you do more runs, we are confident to say you can see consistent results.
After you have tested the model (setting if_log_results=True
), you can run
python summary.py
to summary the results. It will show results like:
WaitingForLove|cc|0.7042685046323598
SpaceWar|cc|0.5894272979827889
KingKong|cc|0.540897356844711
SpaceWar2|cc|0.6233880089121158
Guitar|cc|0.6216974696171442
BTSRun|cc|0.5312920575202599
CMLauncher2|cc|0.6347282964835593
Symphony|cc|0.669106020987788
RioOlympics|cc|0.7695397332776495
Dancing|cc|0.6590623821187533
StarryPolar|cc|0.6511584747528362
InsideCar|cc|0.7628155513781555
Sunset|cc|0.7349986526376743
Waterfall|cc|0.6883913118465134
BlueWorld|cc|0.7158131732815242
Avg|0.6597722861515888
which should be able to reproduce the numbers reported in the paper. If you meet any problem reproduce the numbers, please do not hesitate to contact us, you feed back on the environment settings and parameter settings will be well appreciated, since we are trying to provide the community a solid proposal.
Runpython kill.py
to kill the session.
Please don not hesitate to open an issue. We do not encourage you to contact us directly, opening an issue would be the best way to raise up your questions.
Some known issues & fixations are:
Navigate to w-0
in tmux to see if this worker is working properly, because this worker is responsible for restoring model from disk, while other worker just async with it.
Then check <log_dir>/train/checkpoint
, it should look like:
model_checkpoint_path: "model.ckpt-5362890"
all_model_checkpoint_paths: "model.ckpt-5359910"
all_model_checkpoint_paths: "model.ckpt-5360655"
all_model_checkpoint_paths: "model.ckpt-5361444"
all_model_checkpoint_paths: "model.ckpt-5362210"
all_model_checkpoint_paths: "model.ckpt-5362890"
the model_checkpoint_path
points to the latest checkpoint, the all_model_checkpoint_paths
points to all available checkpoints.
Please make what is listed here matches the files lies in <log_dir>/train/
.
Then you will see a likely reason for restore failure is that the recent ckpt file is not stored completely when you killed the program, but it has been listed as available and should-be-restored in the checkpoint file. Thus, you can simply remove corresponding ckpt file, along with modifying codes in checkpoint file.
For example, in about case, change <log_dir>/train/checkpoint
to:
model_checkpoint_path: "model.ckpt-5362210"
all_model_checkpoint_paths: "model.ckpt-5359910"
all_model_checkpoint_paths: "model.ckpt-5360655"
all_model_checkpoint_paths: "model.ckpt-5361444"
all_model_checkpoint_paths: "model.ckpt-5362210"
and delete <log_dir>/train/model.ckpt-5362890
will remove the most recent ckpt at 5362890 and restore the ckpt at 5362210.
It is likely that you are running remap on /media/..
instead of /home/...
, since the remap is only excutable on home.
A quick fix is to copy remap to your home, chmod and test if it is excutable.
After you comfirm that it works in home, change ./remap
in vr_player.py
to [ABSOLUTE_PATH_TO_YOUR_HOME]/remap
.
We propose a reward function that can capture transition of the attention.
Our reward function | Baseline reward function |
---|---|
Specifically, in above example, the woman and the man are passing the basketball between each other, and subjects' attention are switching between them while they passing the basketball. Our reward function is able to capture these transitions of the attentions smoothly, while the baseline reward function makes the agent focus on the man all the time, even when the basketball is not in his hands.
The mat file includes 76 cells, corresponding to the HM data of all 76 videos. Each cell records the longitude and latitude of HM for 58 subjects, with a total of 116 columns. The longitude and latitude are arranged alternately. For example, the first and second column is the latitude and longitude of the first subject, respectively. Note that the sampling rate of the data is twice as the video FPS. The HM data takes the front center as the origin, and the upper & left as the positive direction. Thus, the longitude ranges from -180 to 180, and the latitude ranges from -90 to 90.
Yuhang Song | Mai Xu | Jianyi Wang | Minglang Qiao | Zulin Wang |
---|---|---|---|---|
We would like to give special thanks to following researchers, for their valuable discussion and contribution to this work.
Ziyu Zhu | Haochen Wang | Chen Li | Lai Jiang |
---|---|---|---|
The code is based on the A3C implementation by OpenAI, we thank a lot for their contribution to the community.