neo4j/neo4j-graphrag-python

Uncaught Neo4j Exception

Closed this issue · 9 comments

Always get this uncaught exception when running the VectorRetriever().search() function.

This occurs regardless if vector index was previously created or hasn't yet been added. Wrapping in following a try-except does not gracefully catch this either:

    try:
        driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
        embedder = OpenAIEmbeddings(model="text-embedding-3-large")
        retriever = VectorRetriever(driver, INDEX_NAME, embedder)
        response = retriever.search(query_text=query, top_k=5)
    except Exception as e:
        logging.ERROR(f"Error: {e}")
        # Never caught

Console output:

DEBUG:neo4j.io:[#C339]  _: <CONNECTION> server state: READY > TX_READY_OR_TX_STREAMING
DEBUG:neo4j.io:[#C339]  S: SUCCESS {'t_first': 167, 'fields': ['node', 'score'], 'qid': 0}
DEBUG:neo4j.io:[#C339]  S: FAILURE {'code': 'Neo.ClientError.Procedure.ProcedureCallFailed', 'message': 'Failed to invoke procedure `db.index.vector.queryNodes`: Caused by: java.lang.IllegalArgumentException: Index query vector has 3072 dimensions, but indexed vectors have 384.'}
DEBUG:neo4j.io:[#C339]  C: RESET
DEBUG:neo4j.io:[#C339]  _: <CONNECTION> client state: TX_READY_OR_TX_STREAMING > READY
DEBUG:neo4j.io:[#C339]  S: SUCCESS {}
DEBUG:neo4j.io:[#C339]  _: <CONNECTION> server state: FAILED > READY
DEBUG:neo4j.pool:[#C339]  _: <POOL> released bolt-155989
ERROR:root:Neo4j Uncaught Exception: 'int' object is not callable

Hi @jalakoo, thanks for raising this issue.
Reading the error message, it looks like an index already exists and that it has 384 dimensions where as the query vector has 3072.
If you go to your Neo4j Browser (or workspace if you're using Aura) and run SHOW INDEXES, do you see an index in the list?

Hey @oskarhane,

For background I'm using a copy of the Movies dataset found with the sandbox dbs. I used INDEX_NAME = "vector" but don't see it in the list of indexes in browser.

Screenshot 2024-06-17 at 9 38 48 AM

Similar error occurs when trying to create the vector index beforehand.

This code:

    INDEX_NAME = "vector"
    try:
        driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
        embedder = OpenAIEmbeddings(model="text-embedding-3-large")

        create_vector_index(
            driver,
            INDEX_NAME,
            label="Movie",
            property="plot",
            dimensions=1536,
            similarity_fn="euclidean",
        )
    except Exception as e:
        logging.ERROR(f"Caught Error: {e}")

Produces the following uncatchable error:

DEBUG:neo4j.io:[#DEC2]  _: <CONNECTION> client state: READY > TX_READY_OR_TX_STREAMING
DEBUG:neo4j.io:[#DEC2]  C: RUN 'CREATE VECTOR INDEX $name FOR (n:Movie) ON n.plot OPTIONS { indexConfig: { `vector.dimensions`: toInteger($dimensions), `vector.similarity_function`: $similarity_fn } }' {'name': 'vector', 'dimensions': 1536, 'similarity_fn': 'euclidean'} {}
DEBUG:neo4j.io:[#DEC2]  C: PULL {'n': 1000}
DEBUG:neo4j.io:[#DEC2]  S: SUCCESS {}
DEBUG:neo4j.io:[#DEC2]  _: <CONNECTION> server state: READY > TX_READY_OR_TX_STREAMING
DEBUG:neo4j.io:[#DEC2]  S: FAILURE {'code': 'Neo.ClientError.Statement.SyntaxError', 'message': 'Invalid input \'$\': expected "FOR", "IF" or an identifier (line 1, column 21 (offset: 20))\n"CREATE VECTOR INDEX $name FOR (n:Movie) ON n.plot OPTIONS { indexConfig: { `vector.dimensions`: toInteger($dimensions), `vector.similarity_function`: $similarity_fn } }"\n                     ^'}
DEBUG:neo4j.io:[#DEC2]  C: RESET
DEBUG:neo4j.io:[#DEC2]  _: <CONNECTION> client state: TX_READY_OR_TX_STREAMING > READY
DEBUG:neo4j.io:[#DEC2]  S: IGNORED
DEBUG:neo4j.io:[#DEC2]  S: SUCCESS {}
DEBUG:neo4j.io:[#DEC2]  _: <CONNECTION> server state: FAILED > READY
DEBUG:neo4j.pool:[#DEC2]  _: <POOL> released bolt-21
ERROR:root:Neo4j Uncaught Exception: 'int' object is not callable

Hi @jalakoo, could you tell me more about your Python environment? I'm having trouble recreating your errors. I've created a sandbox using the Movies dataset and the following works for me to create the index

import logging

from langchain_openai import OpenAIEmbeddings
from neo4j import GraphDatabase
from neo4j_genai.indexes import create_vector_index

OPEN_AI_API_KEY = "<API_KEY>"
NEO4J_URI = "<URI>"
NEO4J_USERNAME = "<USERNAME>"
NEO4J_PASSWORD = "<PASSWORD>"
INDEX_NAME = "vector"

try:
    driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
    embedder = OpenAIEmbeddings(model="text-embedding-3-large", api_key=OPEN_AI_API_KEY, dimensions=1536))

    create_vector_index(
        driver,
        INDEX_NAME,
        label="Movie",
        property="plot",
        dimensions=1536,
        similarity_fn="euclidean",
    )
except Exception as e:
    logging.error(f"Caught Error: {e}")

Similarly, once the vector have been added to the Movie nodes the following works for me to query the index

import logging

from langchain_openai import OpenAIEmbeddings
from neo4j import GraphDatabase
from neo4j_genai import VectorRetriever

OPEN_AI_API_KEY = "<API_KEY>"
NEO4J_URI = "<URI>"
NEO4J_USERNAME = "<USERNAME>"
NEO4J_PASSWORD = "<PASSWORD>"
INDEX_NAME = "vector"

try:
    query = "Test query"
    driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
    embedder = OpenAIEmbeddings(model="text-embedding-3-large", api_key=OPEN_AI_API_KEY, dimensions=1536)
    retriever = VectorRetriever(driver, INDEX_NAME, embedder)
    response = retriever.search(query_text=query, top_k=5)
    print(response)
except Exception as e:
    logging.error(f"Error: {e}")

How are you providing your OpenAI API key to the embeddings model?

@alexthomas93 I added the openai key through the .env. Didn't notice a difference when adding it to the embedder explicitly like you have above.

I tried using both Poetry and Pipenv to run in Python3.11 and 3.12. Just pushed everything to a public repo at https://github.com/jalakoo/neo4j-genai-starterkit for reference.

Only real difference I can see is I have this function wrapped in a FastAPI server. I'll trying running directly in a new project without FastAPI.

Thanks for sharing @jalakoo. How are you creating and inserting the actual embeddings themselves? I can't see this being done anywhere in your code. You can use the following after you've created the index if you'd like to do this using Cypher

embedding_query = """
MATCH (movie:Movie) WHERE movie.tagline IS NOT NULL
WITH movie, genai.vector.encode(
    movie.tagline,
    "OpenAI", 
    {
        token: $openAiApiKey,
        endpoint: $openAiEndpoint
    }) AS vector
CALL db.create.setNodeVectorProperty(movie, "plot", vector)
"""
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
driver.execute_query(
    embedding_query,
    {
        "openAiApiKey": OPENAI_API_KEY,
        "openAiEndpoint": "https://api.openai.com/v1/embeddings",
    },
)

Thanks @alexthomas93, I was using a separate LangChain project to build the embeddings with the Movies dataset, and the Recommendations dataset I used was this dump file which already includes generated embeddings.

Also discussed my findings in Slack. But my observations so far is that both the create_vector_index and search functions run as expected the first time, but there might be an issue with their error handling. Specifically:

  • create_vector_index can not be used after the vector index has already been created, even if using a new index_name. An error here is okay, but was expecting it to be caught within a try-except
  • The search function runs fine if the index exists, but if it doesn't, it raises this uncaught 'int' object not callable error. Again I would have expected this error to be catchable in a try-except block.

Additionally:

  • create_vector_index will run without error if targeting a non-existent property, giving the impression it was created correctly. But any subsequent .search call will not return any results. A message informing the dev that an index for a property that doesn't exist would be useful, or at least a note in the docs that this is possible. My expectation, at least, was that I wouldn't be able to create an index for a node-property combo that either is a)not appropriate (not an embedding), b)doesn't exist.

Hi @jalakoo. I've done a bit more digging and I think there are several things going on here. In both error cases (create_vector_index and search) I believe the errors are being raised correctly and caught by your except clauses. The 'int' object is not callable errors you're seeing are due to the fact you're using logging.ERROR in your except statements, which is an error code i.e. an integer, rather than logging.error which is a function. As for the create_vector_index function, you don't need to provide it the name of the property you'll use to create the vectors. The property attribute in this function refers to the property that will be used to host the vectors themselves, and you are safe to create an index before you create this property on any nodes. I think we should update the docs though as I don't believe we make this clear.

Thanks @alexthomas93 - repeating here. The issue was in my implementation of the logging. Updating logging.ERROR() to logging.error() revealed the try-except is indeed working.