Azure/azure-search-vector-samples

Azure Vector Search not working with OpenAI new embedding model "text-embedding-3-large"

Closed this issue · 9 comments

Hi Team, I found the Azure Vectorstore is not working with OpenAI new embedding model "text-embedding-3-large" where the length of the embedding vector is 3072.

Python code:
from langchain_community.vectorstores.azuresearch import (
AzureSearch,
AzureSearchVectorStoreRetriever,
)
model = "text-embedding-3-large"
embeddings = OpenAIEmbeddings(deployment=model, model=model,dimensions=3072)
vector_store: AzureSearch = AzureSearch(
azure_search_endpoint=AZURE_SEARCH_ENDPOINT,
azure_search_key=AZURE_SEARCH_KEY,
index_name=index_name,
embedding_function=embeddings.embed_query,
)

Error:
File ~\anaconda3\Lib\site-packages\azure\search\documents\indexes_generated\operations_indexes_operations.py:403, in IndexesOperations.create(self, index, request_options, **kwargs)
401 map_error(status_code=response.status_code, response=response, error_map=error_map)
402 error = self._deserialize.failsafe_deserialize(_models.SearchError, pipeline_response)
--> 403 raise HttpResponseError(response=response, model=error)
405 deserialized = self._deserialize("SearchIndex", pipeline_response)
407 if cls:

HttpResponseError: (InvalidRequestParameter) The request is invalid. Details: definition : The vector field 'content_vector' must have the property 'dimensions' set to a positive, non-zero integer between 2 and 2048.
Code: InvalidRequestParameter
Message: The request is invalid. Details: definition : The vector field 'content_vector' must have the property 'dimensions' set to a positive, non-zero integer between 2 and 2048.
Exception Details: (InvalidField) The vector field 'content_vector' must have the property 'dimensions' set to a positive, non-zero integer between 2 and 2048. Parameters: definition
Code: InvalidField
Message: The vector field 'content_vector' must have the property 'dimensions' set to a positive, non-zero integer between 2 and 2048. Parameters: definition

Comment: I checked that the langchain supports the length of the embedding vector 3072. Can you please check in azure?

Yes, we don't support vectors of this length yet unfortunately. You would need to use the dimensions property to reduce it below 2048.

We are rolling out a fix that will have a max dimension limit of 3072 and should be complete by end of Feb. Thanks.

I'm waiting for the fix. Please update this thread when the rollout is complete.

Hi, it should be available globally by now! @AVIN8233

Thanks Farzad, it is working now.

it is fixed now

we are using microsoft gpt-rag repository and cognitive search doesnt allow us to index dimension higher than 2048. Also when trying to create the vectorizer while setting the contentVector field.. it only allow us to choose the ada-002 model. We have a text-embedding-3-large deployed but it doesnt appear as an option. @farzad528

Hi @jucastag, can you link the repo you are referring to? I'll try and see how I can help but at a minimum you should flag an issue.

Thank you @farzad528, heres the repo: https://github.com/Azure/GPT-RAG. will flag an issue there also. I post it here because I was looking to solve the issue and came up with this thread. Thank you again