This is a personal, simple learning and testing project, aimed at exploring and testing the functionalities of two models: GPT-4 and Gemini. The project serves as a modest share of my experiences in this area.
The main objectives of the project are:
- To learn and test the models: Gain hands-on experience with GPT-4 and Gemini by calling their APIs and testing their capabilities.
- To share experiences: Document and share insights and findings from experimenting with these models.
- GPT-4 Vision: OpenAI GPT-4 Vision API
- Gemini: Google AI Gemini
The necessary steps for environment setup and execution are as follows:
-
For running GPT-4 image description:
- Install Python.
- Install dependency packages.
- Input your image in the path.
- Add your API key.
-
For testing Gemini:
- Run it directly through Colab: Colab Notebook for Gemini.
- Alternatively, set up the necessary libraries and dependencies locally (e.g., Jupyter Notebook).
- Modify your API key and the image you want to load for description.
Note: If you have any questions about the above instructions, please leave a message in the issues section.
At present, Gemini seems to yield results similar to those of GPT-4, although this project has not conducted extensive and rigorous testing.
- Both Gemini and GPT-4 are outstanding language models for AI applications, particularly in image recognition.
- As of now, both APIs are functioning well. However, Gemini seems a bit slower in producing results compared to GPT-4.
- GPT-4 generally provides more stable outputs. In limited testing, Gemini sometimes exhibited hallucinations.
For queries or suggestions, please contact me at: rongx@vt.edu