/infoseek_eval

EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions

Primary LanguagePythonMIT LicenseMIT

InfoSeek Evaluation Toolkit


InfoSeek, A New VQA Benchmark focus on Visual Info-Seeking Questions

Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

Yang Chen, Hexiang Hu, Yi Luan, Haitian Sun, Soravit Changpinyo, Alan Ritter and Ming-Wei Chang.

Release

  • [24/5/30] We release the dataset/image snapshot on Huggingface: ychenNLP/oven.
  • [23/6/7] We are releasing InfoSeek Dataset and evaluation script.

InfoSeek Dataset

To download image snapshot, please refer to OVEN.

To download annotations, please run the bash script "download_infoseek_jsonl.sh" inside the infoseek_data folder (from Google Drive).

Evaluation Script

Run evaluation on InfoSeek validation set:

python run_evaluation.py

# ====Example====
# ===BLIP2 instruct Flan T5 XXL===
# val final score: 8.06
# val unseen question score: 8.89
# val unseen entity score: 7.38
# ===BLIP2 pretrain Flan T5 XXL===
# val final score: 12.51
# val unseen question score: 12.74
# val unseen entity score: 12.28

Starting Code

Run BLIP2 zero-shot inference:

python run_blip2_infoseek.py --split val

Run BLIP2 fine-tuning:

python run_training_lavis.py

Acknowledgement

If you find InfoSeek useful for your your research and applications, please cite using this BibTeX:

@article{chen2023infoseek,
  title={Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?},
  author={Chen, Yang and Hu, Hexiang and Luan, Yi and Sun, Haitian and Changpinyo, Soravit and Ritter, Alan and Chang, Ming-Wei},
  journal={arXiv preprint arXiv:2302.11713},
  year={2023}
}