Agents for Amazon Bedrock Runtime not returning source metadata in APIs
tim-finnigan opened this issue ยท 25 comments
Original issue: boto/boto3#4124 (ref: P131777621)
@tim-finnigan Any ETA on this?
I was getting crazy searching for a parameter, flag or configuration I missed in order to show the source metadata.
While the invoke_agent documentation shows that the metadata should come within the retrievedReferences object, in the documentation for the trace-events under OrchestrationTrace --> Observation the 'metadata' object it is not present https://docs.aws.amazon.com/bedrock/latest/userguide/trace-events.html
Thanks god I've found this open issue.
Hope there is an update soon ๐ and thanks for working on this.
I was able to resolve this by upgrading to boto3 1.34.118
This issue is now closed.
Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.
Hi @adoyon23
I've been trying with both 1.34.118 and the latest one 1.34.119 versions, but still cannot find the metadata object within retrievedReferences when calling the client.invoke_agent method.
I can confirm that the metadata is showing when using the Knowledge Base playground but not when using the invoke_agent.
Did you have to enable or configure something else?
Hi @tim-finnigan - I am still not able to find resolution for this issue. invoke_agent still doesnt have metadata in the output.
Hi @edu2105 @sriram-aws thanks for following up. After speaking with an engineer on the Bedrock team, I was told that they are aware of this issue and are planning a fix soon. Will keep this open to track for now.
@tim-finnigan - Thanks for the update!
Hi,
I'm experiencing an issue with the bedrock-agent-runtime retrieve_and_generate service. When I invoke this service, the retrievedReferences (retrieve metadata) is not included in the response. This functionality worked until the middle of last week, but it has since stopped working with any version of boto3.
Please confirm if this is related to this issue, or should I open a new bug report it?
If it is the same issue, do you have an estimated timeline for a fix?
python: 3.11
boto3: 1.34.122
Request:
input={
'text': 'some question'
},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': '1234',
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-haiku-20240307-v1:0',
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 3
}
},
'generationConfiguration': {
'inferenceConfig': {
'textInferenceConfig': {
"temperature": 0.1,
}
}
}
},
}
Response:
{
"ResponseMetadata": {
"HTTPHeaders": {
"connection": "keep-alive",
"content-length": "1030",
"content-type": "application/json",
"date": "Sun, 09 Jun 2024 13:18:00 GMT",
"x-amzn-requestid": "1234"
},
"HTTPStatusCode": 200,
"RequestId": "1234",
"RetryAttempts": 0
},
"citations": [
{
"generatedResponsePart": {
"textResponsePart": {
"span": {
"end": 411,
"start": 0
},
"text": "generated response"
}
},
"retrievedReferences": []
}
],
"output": {
"text": "generated response"
},
"sessionId": "1234"
}
@motigors that appears to be a related issue. No timeline on addressing this but the Bedrock team informed me that they are working on it.
Thanks for the update
@tim-finnigan I'm having a similar issue, but with the bedrock-agent-runtime retrieve
method. It seems like it's related.
I'm glad to see this is being worked on! If there are any updates, it'd be great to hear. If my issue is unrelated, I can open a new issue if there isn't a relevant one already.
Here's some info about what I was doing in case it helps. I haven't been able use a RetrievalFilter
in the vectorSearchConfiguration either. But I have confirmed in the bedrock console testing that the metadata exists and is working correctly for filtering.
Python 3.12
boto3: 1.34.42
response = bedrock_agent_runtime_client.retrieve(
knowledgeBaseId="ABC1234567",
# nextToken="",
retrievalConfiguration={
"vectorSearchConfiguration": {
"numberOfResults": 6,
}
},
retrievalQuery={
"text": "some information"
}
)
Response:
{
'ResponseMetadata': {
'HTTPHeaders': {
'connection': 'keep-alive',
'content-length': '16283',
'content-type': 'application/json',
'date': 'Thu, 20 Jun 2024 19:38:59 GMT',
'x-amzn-requestid': 'totally-real-id'
},
'HTTPStatusCode': 200,
'RequestId': 'totally-real-id',
'RetryAttempts': 0
},
'retrievalResults': [
{'content': {'text': 'relevant stuff and things'},
'location': {'s3Location': {'uri': 's3://bucket/subdirectory/document_1.pdf'},
'type': 'S3'},
'score': 0.7049113},
{'content': {'text': 'relevant stuff and things'},
'location': {'s3Location': {'uri': 's3://bucket/subdirectory/document_2.pdf'},
'type': 'S3'},
'score': 0.704336},
{'content': {'text': 'relevant stuff and things'},
'location': {'s3Location': {'uri': 's3://bucket/subdirectory/document_3.pdf'},
'type': 'S3'},
'score': 0.700758},
{'content': {'text': 'relevant stuff and things'},
'location': {'s3Location': {'uri': 's3://bucket/subdirectory/document_4.pdf'},
'type': 'S3'},
'score': 0.70058656},
{'content': {'text': 'relevant stuff and things'},
'location': {'s3Location': {'uri': 's3://bucket/subdirectory/document_5.pdf'},
'type': 'S3'},
'score': 0.7002422},
{'content': {'text': 'relevant stuff and things'},
'location': {'s3Location': {'uri': 's3://bucket/subdirectory/document_6.pdf'},
'type': 'S3'},
'score': 0.6993249}
]
}
I've run into this bug as well. Any updates?
We were informed by the Bedrock team that this should be fixed now, please try updating to the latest version of your SDK and let us know if still running into any issues.
I'm still having the issue with the retrieve
function not returning any metadata.
I'm using a Python 3.12 Lambda function with boto3 version 1.34.42.
When I check the retrieval results, I'm expecting to get something like this (from the documentation).
{
"nextToken": "string",
"retrievalResults": [
{
"content": {
"text": "string"
},
"location": {
"confluenceLocation": {
"url": "string"
},
"s3Location": {
"uri": "string"
},
"salesforceLocation": {
"url": "string"
},
"sharePointLocation": {
"url": "string"
},
"type": "string",
"webLocation": {
"url": "string"
}
},
"metadata": {
"string" : JSON value
},
"score": number
}
]
}
But the only keys in the retrieval results I'm getting are "content", "location", and "score".
The other day I created a new knowledge base using Pinecone for the vector database hoping that might have a different response, but it was the same. When I check the vector database in Pinecone though, I'm see that all of the metadata from the metadata files I created has been included.
Let me know if there's any more information I can provide that might help.
I'll see if I can manually upgrade the boto3 version in the lambda function. It's currently running the latest execution environment supported version of the SDK as far as I know. From what I've seen, there's not an official way to upgrade the package within Lambda. If you know of a way, please let me know.
Here is documentation on bundling Python dependencies like Boto3 in Lambda: https://docs.aws.amazon.com/lambda/latest/dg/python-package.html. Also this Knowledge Center post: https://repost.aws/knowledge-center/lambda-python-runtime-errors.
@tim-finnigan It worked! That was my first time creating a deployment package with dependencies as a layer, but that Knowledge Center post saved me. I'm correctly getting all of the metadata attributes in the retrieve
response now.
Is there somewhere that tracks when the supported version of the SDK for Lambda functions will be updated? For now, I can just leave this layer in place until it catches up with the latest version of the SDK. Thanks for your help!
Hi @tim-finnigan thanks for the update!
I have updated boto3 to the latest version and ran my process but unfortunately, when running boto3.invoke_agent method I'm still not able to find the metadata object within retrievedReferences
maybe it is working for other methods and not for invoke_agent for the moment?
This is the log showing that I'm using the latest boto3 version
And this is part of the log where you can see that only gives back the content
and location
objects
[...] "trace":{ "orchestrationTrace":{ "observation":{ "knowledgeBaseLookupOutput":{ "retrievedReferences":[ { "content":{ "text":"SCRUBBED" }, "location":{ "s3Location":{ "uri":"s3://xxxxx/gdrive/xxxx.pdf" }, "type":"S3" } } ] } } } } } [...]
Friends, the fact that the metadata is being returned now is fabulous. However, my Bedrock knowledge base S3 bucket is populated with documents that have our custom metadata, such as x-amz-meta-doi
, or x-amz-meta-ncbi_search_term
. These, however, are not being returned by the TypeScript SDK. Is there a reason for this? What can I do about it?
Hi @malikalimoekhamedov from my understanding, only the metadata attributes that were included within .metadata.json
Document metadata files and uploaded to your S3 bucket will be the metadata returned in the response.
There are certain conditions needed and are mentioned here https://docs.aws.amazon.com/bedrock/latest/userguide/s3-data-source-connector.html#configuration-s3-connector
For the invoke_agent() method, I was not able to find them even using the latest boto3 version. If you are using another method, maybe you will be able to get your metadata in the response.
Since the original issue was addressed here (and that was confirmed by the Bedrock team) I'm going to close this as resolved. The issue involving retrievedReferences
is being tracked in #777. Please try using https://repost.aws/ to ask about questions involving service APIs.
This issue is now closed.
Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.