The repo contains:
- All the original video frames (from 21 accessibility-related videos, divided into 81 segments)
- Ground Truth annotations for all frames of 31 video segments
- A list of accessibility-related objects
- Outputs of two VQA models (GPV-1 and BLIP), and their compairson with ground truth, when available