/fpl

Code/data of the paper "Future Person Localization in First-Person Videos" (CVPR2018)

Primary LanguagePythonMIT LicenseMIT

Future Person Localization in First-Person Videos (CVPR2018)

This repository contains the code and data (caution: no raw image provided) for the paper "Future Person Localization in First-Person Videos" by Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani and Yoichi Sato.

Prediction examples

Requirements

We confirmed the code works correctly in below versions.

Installation

Download data

You can download our dataset from below link:
(caution: no raw image provided!)
Download link (processed data)

If you wish downloading via terminal, consider using custom script.

Extract the downloaded tar.gz file at the root directory.

tar xvf fpl.tar.gz

Pseudo-video

Since we cannot release the raw images, we prepared sample pseudo-video below.
The video shows the automatically extracted location histories, poses. The number shown in the bounding box corresponds to the person id in the processed data.
Background colors are the result from pre-trained dilated CNN trained with MIT Scene Parsing Benchmark.
Download link (pseudo-video)

Create dataset

Run dataset generation script to preprocess raw locations/poses/egomotions.
A single processed file will be generated in datasets/.

# Test (debug) data
python utils/create_dataset.py utils/id_test.txt --traj_length 20 --traj_skip 2 --nb_splits 5 --seed 1701 --traj_skip_test 5
# All data
python utils/create_dataset.py utils/id_list_20.txt --traj_length 20 --traj_skip 2 --nb_splits 5 --seed 1701 --traj_skip_test 5

Prepare training script

Modify the "in_data" arguments in scripts/5fold.json.

Running the code

Directory structure

    .
    +---data (feature files)
    +---dataset (processed data)
    +---experiments (logging)
    +---gen_scripts (automatically generated scripts for cross validation)
    +---models
    +---scripts (configuration)
    |   +---5fold.json
    +---utils
        +---run.py (training script)
        +---eval.py (evaluation script)

Training

In our environment (a single TITAN X Pascal w/ CUDA 8, cuDNN 5.1), it took approximately 40 minutes per split.

# Train proposed model and ablation models
python utils/run.py scripts/5fold.json run <gpu id>
# Train proposed model only
python utils/run.py scripts/5fold_proposed_only.json run <gpu id>

Evaluation

python utils/eval.py experiments/5fold_yymmss_HHMMSS/ 17000 run <gpu id> 10

Prediction visualization using pseudo-video

We provided visualization code using pseudo-video.
Download below pseudo-videos and run the following code:
Download link (pseudo-videos for visualization)

# Run this code after placing <video_id>.mp4 into data/pseudo_viz/
# Extract images from video
python utils/video2img_all.py data/pseudo_viz/
# Plot images
python utils/plot_prediction.py <experiment>/<fold> --traj_type 0
# Write videos
python utils/write_video.py <experiment>/<fold> --vid GOPRXXXXU20 --frame XXXX --pid XXX

License and Citation

The dataset provided in this repository is only to be used for non-commercial scientific purposes. If you used this dataset in scientific publication, cite the following paper:

Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani and Yoichi Sato. Future Person Localization in First-Person Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

@InProceedings{yagi2018future,
    title={Future Person Localization in First-Person Videos},
    author={Yagi, Takuma and Mangalam, Karttikeya and Yonetani, Ryo and Sato, Yoichi},
    booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2018}
}