/STAR_Benchmark

Primary LanguagePythonApache License 2.0Apache-2.0

STAR: A Benchmark for Situated Reasoning in Real-World Videos

The STAR Challenge this year is begining: STAR Challenge

[STAR Homepage] Reasoning in the real world is not divorced from situations. A key challenge is to capture the present knowledge from surrounding situations and reason accordingly. STAR is a novel benchmark for Situated Reasoning, which provides challenging question-answering tasks, symbolic situation descriptions and logic-grounded diagnosis via real-world video situations.

Overview

STAR: A Benchmark for Situated Reasoning in Real-World Videos [Paper PDF]
Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B Tenenbaum, Chuang Gan, NeurIPS, 2021.

  • 4 Qutestion Types
  • 60k Situated Questions
  • 23k Situation Video Clips
  • 140k Situation Hypergraphs

Online Evaluation

You are welcome to use STAR Challenge Leaderboard for the online evaluation on the test dataset.

STAR Visulazation

We prodive code/QA_Visualization.ipynb to visualize Question / Options / Video / Situation Graphs of the STAR data by using QA Visualization Script.

  • before visualization, please download the Supporting Data (include video keyframes from Action Genome and original videos from Charades) and place them in the mentioned directories in the scripts.

STAR Data Outline

Question, Multiple Choice Answers and Situation Graphs

Question-Answer Templates and Programs

Situation Video Data

Annotations

Supporting Data

Our bench built upon Charades Dataset and Action Genome, please download raw videos from Charades Dataset as follows:

Data Usage

To download STAR dataset, please refer to the STAR Homepage or follow the instructions below.

  • Raw Video

    Download raw videos (scaled to 480p) from Charades Videos.

  • Frame Dumping Tool

    To get keyframes in each video, please follow the instruction in Action Genome. Please note that graph annotations offered by STAR use the same frame index generated by the above dumping tool.

  • Video Clips

    We offer video start and end time of each QA in Video Segments, and keyframes index in Video Keyframe IDs. You can use them to get the same video clips used in STAR. We use ffmpeg to trim raw videos with annotations, run:

    • ffmpeg -y -ss start_time -to end_time -i input_path -codec copy output_path
  • Question, Multiple Choice Answers and Situation Graphs

    You can download STAR Video QA via following links:

    Questions and Answers: Train | Val | Test | Train/Val/Test Split File

  • Classes

    The classes of actions, verbs, objects, and relationships are included in classes files

  • Human Poses

    We extracted human poses in each keyframe via AlphaPose. You can download poses we extracted. Poses are referred by video ID and keyframe ID.

  • Other

    We offer the question, answer and program templates we designed in Question Templates and QA Programs.

    You can use those templates, Object Bounding Boxes, and Human Bounding Boxes to generate new QAs with Situation Graphs.

STAR Program Execution

In STAR, we introduce a neuro-symbolic framework Neuro-Symbolic Situated Reasoning (NS-SR) to get answers by executing programs with corresponding situation graphs. run

  • python run_program.py --qa_dir your_path

to utlize STAR program excutor.

STAR Generation

We also prodive our QA generation code, you can generate new STAR questions from more situation videos: QA Generation Code

Citation

If you use STAR in your research or wish to refer to the results published in the paper, please use the following BibTeX entry.

@inproceedings{wu2021star_situated_reasoning,
author={Wu, Bo and Yu, Shoubin and Chen, Zhenfang and Tenenbaum, Joshua B and Gan, Chuang},
title = {{STAR}: A Benchmark for Situated Reasoning in Real-World Videos},
booktitle = {Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS)},
year = {2021}
}