ACQUIRED: A Dataset for Answering Counterfactual Questions In Real Life Videos

This repository releases the curated ACQUIRED dataset for counterfactual question answering on real-life videos. For more details, please check out our EMNLP 2023 paper.

ACQUIRED consists of 3.9K annotated videos, encompassing a wide range of event types and incorporating both first and third-person viewpoints, which ensures a focus on real-world diversity.

Each video is annotated with questions that span three distinct dimensions of reasoning, including physical, social, and temporal, which can comprehensively evaluate the model counterfactual abilities along multiple aspects.

Download The Videos

Please download the zip file from this google drive link and directly unzip it here.

The unziped folder structure should look like below:

acquired_dataset
├── ego4d
│   ├── 002d2729-df71-438d-8396-5895b349e8fd
│   ├── 01db7c39-a512-4bac-b284-dff8c7360e80
│   └── ... ...
└── oopsqa

Check Out The Data

The main split of the datasets are prepared as .json files under the folder Dataset, which contains train.json, val.json, and test.json, the official splits used in the above paper.

Please follow the instruction in Demo.ipynb to visualize the data samples and check the structure of the datasets more in-depth.

Generally, each data point in {split}.json file under the folder Dataset will have fields like below (as an entry of a list of dicts:

{                                          
    "video_id": ...,                          
    "domain": ...,                              
    "type": "Counterfactual",                            
    "question": ...,               
    "answer1": ...,             
    "answer2": ...,            
    "correct_answer_key": "answer{1/2}",                        
    "video_url": "url_of_the_video.mp4",
    "video_path": "path/to/video.mp4"
},

For correspondence between each data point and the name of the video, please refer to the field video_path.

The video_path will be the relative path under the root directory of the downloaded folder (i.e., in this example, acquired_dataset if you did not change the downloaded folder name).

Citation

If you find our curated resource useful, please cite our paper using:

@inproceedings{wu2023acquired,
  title = {ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos},
  author = {Wu*, Te-Lin and Dou*, Zi-Yi and Hu*, Qingyuan and Hou, Yu and Chandra, Nischal Reddy and Freedman, Marjorie and Weischedel, Ralph and Peng, Nanyun},
  booktitle = {The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2023}
}

PlusLabNLP/acquired

ACQUIRED: A Dataset for Answering Counterfactual Questions In Real Life Videos

Download The Videos

Check Out The Data

Citation