ScanDMM: A Deep Markov Model of Scanpath Prediction for 360° Images [CVPR2023]

✨Paper ✨Poster ✨Presentation (YouTube) ✨Slide ✨OpenReview

Samples_compression.mp4

Implementation version

Pytorch 1.8.1 & CUDA 10.1.
Please referring to requirements.txt for details.
If your CUDA version is 10.1, you can directly execute the following command to install the environment：

conda create -n scandmm python==3.7  
conda activate scandmm
pip install -r requirements.txt

Training

To reproduce the training and validation dataset, please referring to data_process.py. Alternatively, using the ready-to-use data.
Execute:

python train.py --seed=1234 --dataset='./Datasets/Sitzmann.pkl' --lr=0.0003 --bs=64 --epochs=500 --save_root='./model/'

Check the training log and checkpoints in Log (created automatically) and ./model files, respectively.

Test

Prepare the test images and put them in a folder (e.g, ./demo/input)
Create a folder to store the results (e.g, ./demo/output)
A pre-trained weights (e.g, './model/model_lr-0.0003_bs-64_epoch-435.pkl')
Execute:

python inference.py --model='./model/model_lr-0.0003_bs-64_epoch-435.pkl' --inDir='./demo/input' --outDir='./demo/output' --n_scanpaths=200 --length=20 --if_plot=True

Modify n_scanpaths and length to change the number and length of the produced scanpaths. Please referring to inference.py for more details about the produced scanpaths.

Check the results:

sp_P48_5376x2688.png

scanpaths = np.load(P48_5376x2688.npy)
print(scanpaths.shape)
(200, 20, 2)
# (n_scanpaths, length, (y, x)). (y, x) are normalized coordinates in the range [0, 1] (y/x = 0 indicate the top/left edge).

sp_P8_7500x3750.png

scanpaths = np.load(P8_7500x3750.npy)
print(scanpaths.shape)
(200, 20, 2)

Bibtex

@InProceedings{scandmm2023,
  title={ScanDMM: A Deep Markov Model of Scanpath Prediction for 360° Images},
  author={Xiangjie Sui and Yuming Fang and Hanwei Zhu and Shiqi Wang and Zhou Wang},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition}, 
  year={2023}
}

Acknowledgment

The author would like to thank Kede Ma for his inspiration, Daniel Martin et al. for publishing ScanGAN model and visualization functions, and Bingham Eli et al. for the implementation of Pyro. We sincerely appreciate for their contributions.

xiangjieSui/ScanDMM