FactVC: Factual consistency for Video Captioning

This repository contains the data and code for the paper "Models See Hallucinations: Evaluating the Factuality in Video Captioning".

File Structure

FactVC-main/
├── data/
│   ├── activitynet/
│   │   ├── videos/         # sampled ActivityNet videos
│   │   ├── frames/         # extracted video frames
│   │   ├── captions/       # ground-truth and model-generated captions
│   │   ├── vids.txt        # video ids
│   │   └── factuality_annotation.json  # human factuality annotation
│   ├── youcook2/
│   │   ├── videos/         # sampled YouCook2 videos
│   │   ├── frames/         # extracted video frames
│   │   ├── captions/       # ground-truth and model-generated captions
│   │   ├── vids.txt        # video ids
│   │   └── factuality_annotation.json  # human factuality annotation
│   └── extract_frames.py
├── metric/
│   ├── clip/
│   ├── emscore/
│   └── factvc_corr.py      # code to compute FactVC score and correlation
└── pretrained_models
    └── factvc_video.pth    # our pretrained metric model

Usage

First, download the sampled ActivityNet videos and YouCook2 videos and unzip them into corresponding folders. Download the pretrained FactVC metric model and put it under pretrained_models/ folder.

Then, extract video frames at 1fps (used for computing FactVC metric scores):

cd data/
python extract_frames.py --dataset activitynet
python extract_frames.py --dataset youcook2

Now, you can compute the FactVC scores and the correlation between FactVC score and human annotation:

cd metric/
python factvc_corr.py --dataset activitynet
python factvc_corr.py --dataset youcook2

Acknowledgements

We acknowledge the EMScore project that we based on our work

PKULiuHui/FactVC

FactVC: Factual consistency for Video Captioning

File Structure

Usage

Acknowledgements