Function Call doesn't seem to work with images as part of the prompts
Opened this issue · 2 comments
When configuring Gemini/Vertex AI with a function call and a prompt that includes an image, VertexAI/Gemini throws a 500 error with no description of what the issue actually is.
Environment details
- Programming language: TS
- OS: Linux
- Language runtime version: Node JS v20.x
- Package version: 1.2.0
- Gemini 1.5 Pro Preview 0514
Steps to reproduce
- Create a function call whilst passing in an image as part of the prompt
- Gemini/Vertex AI throws 500 error
In the documentation, all the examples for doing function calls are with text prompts, so I may just be doing something that is not supported, but I also couldn't find anything in the docs that said images are NOT supported as part of prompts for function calls.
Additionally, I have tested my prompt without the image as part of the context, and it does the function call as you would expect.
Please NOTE: I can reproduce this with both the vertex ai library and a straight up curl call as well, so this is most likely a Gemini issue rather than a library issue, but since I don't have a google support contract, I can't really open a support ticket, so for the sake of trying to bring visibility to this issue, I'm filing it here. Apologies if there is a better venue available.
POST API endpoint:
https://us-central1-aiplatform.googleapis.com/v1/projects/<projectcode>/locations/us-central1/publishers/google/models/gemini-1.5-pro-preview-0514:generateContent
What is being passed to Vertex AI:
{
"contents": [
{
"role": "user",
"parts": [
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "base64ofimage"
}
},
{
"text": "categorize the image"
}
]
}
],
"tools": [
{
"functionDeclarations": [
{
"name": "categorize",
"description": "accepts the categorized guess from model and stores it in API cache",
"parameters": {
"type": "object",
"properties": {
"category": { "type": "string", "description": "the category of the image" }
}
}
}
]
}
],
"generationConfig": {
"temperature": 1,
"topP": 0.95
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]
}
response:
{
"error": {
"code": 500,
"message": "Internal error encountered.",
"status": "INTERNAL"
}
}
a working JSON with only a text prompt:
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "categorize an old chandelier"
}
]
}
],
"tools": [
{
"functionDeclarations": [
{
"name": "categorize",
"description": "accepts the categorized guess from model and stores it in API cache",
"parameters": {
"type": "object",
"properties": {
"category": { "type": "string", "description": "the category of the object, example boats" }
}
}
}
]
}
],
"generationConfig": {
"temperature": 1,
"topP": 0.95
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]
}
For reference, here is a Reddit post from about a month ago about the same problem: https://www.reddit.com/r/Bard/comments/1cg1nci/error_500_api/
I just had the same issue. Very annoying and limits the use cases for me. This is what I found in the docs:
Is there any kind of timetable to enable function calling with multimodal prompts ?
As a workaround I found the json mode functionality (only for 1.5 pro though):
It did work in my short tests with 1.5 pro and 1.5 flash, but I dont know if this will be reliable.
Issues with JSON Mode: According to the docs its only supported by 1.5 pro and the response_mime_type parameter does give me a ts error but I got consistent JSON output so far (about 20 API Calls)