/GPT4Vis

Quantitative Evaluation of GPT-4 for Visual Understanding

Primary LanguagePythonMIT LicenseMIT

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

Paper Conference

Wenhao Wu1,2, Huanjin Yao2,3, Mengxi Zhang2,4, Yuxin Song2, Wanli Ouyang5, Jingdong Wang2

1The University of Sydney, 2Baidu, 3Tsinghua University, 4Tianjin University, 5The Chinese University of Hong Kong


This work delves into an essential, yet must-know baseline in light of the latest advancements in Generative Artificial Intelligence (GenAI): the utilization of GPT-4 for visual understanding. We center on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks. To ensure a comprehensive evaluation, we have conducted experiments across three modalities—images, videos, and point clouds—spanning a total of 16 popular academic benchmark.

📣 I also have other cross-modal projects that may interest you ✨.

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
Accepted by AAAI 2023 & IJCV 2023 | [Text4Vis Code]
Wenhao Wu, Zhun Sun, Yuxin Song, Jingdong Wang, Wanli Ouyang

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Accepted by CVPR 2023 | [BIKE Code]
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Accepted by CVPR 2023 as 🌟Highlight🌟 | [Cap4Video Code]
Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang

News

  • [Nov 28, 2023] We release our report in Arxiv.
  • [Nov 27, 2023] Our prompts have been released. Thanks for your star 😝.

Overview

An overview of 16 evaluated popular benchmark datasets, comprising images, videos, and point clouds.

Zero-shot visual recognition leveraging GPT-4's linguistic and visual capabilities.

Generated Descriptions from GPT-4

  • We have pre-generated descriptive sentences for all the categories across the datasets, which you can find in the GPT_generated_prompts folder. Enjoy exploring!

  • We've also provided the example script to help you generate descriptions using GPT-4. For guidance on this, please refer to the generate_prompt.py file. Happy coding! Please refer to the config folder for detailed information on all datasets used in our project.

  • Execute the following command to generate descriptions with GPT-4.

    # To run the script for specific dataset, simply update the following line with the name of the dataset you're working with: 
    # dataset_name = ["Dataset Name Here"]   # e.g., dtd
    python generate_prompt.py

GPT-4V(ision) for Visual Recognition

  • We share an example script that demonstrates how to use the GPT-4V API for zero-shot predictions on the DTD dataset. Please refer to the GPT4V_ZS.py file for a step-by-step guide on implementing this. We hope it helps you get started with ease!

    # GPT4V zero-shot recognition script. 
    # dataset_name = ["Dataset Name Here"]   # e.g., dtd
    python GPT4V_ZS.py
    
    # We also provide a script for batch testing with each request (larger batch sizes may lead to instability).
    python GPT4V_ZS_batch.py

Requirement

For guidance on setting up and running the GPT-4 API, we recommend checking out the official OpenAI Quickstart documentation available at: OpenAI Quickstart Guide.

📌 BibTeX & Citation

If you use our code in your research or wish to refer to the results, please star 🌟 this repo and use the following BibTeX 📑 entry.

@article{GPT4Vis,
  title={GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?},
  author={Wu, Wenhao and Yao, Huanjin and Zhang, Mengxi and Song, Yuxin and Ouyang, Wanli and Wang, Jingdong},
  booktitle={arXiv preprint arXiv:2311.15732},
  year={2023}
}

🎗️ Acknowledgement

This evaluation is built on the excellent works:

  • CLIP: Learning Transferable Visual Models From Natural Language Supervision
  • GPT-4
  • Text4Vis: Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

We extend our sincere gratitude to these contributors.

👫 Contact

For any questions, please feel free to file an issue.