RedisGraph/redisgraph-py

fails to commit large batch inserts

gomesian opened this issue · 2 comments

Hi Team,

Was trying to understand the limits of a batch inserts with this library.
Seams somewhere between 6-8k nodes (3 to 4k edges) fails with:

image

I don't mind breaking up and batching updates of ~10k required node/edges every hour, but need help understanding what is breaking here - so I can configure a safe batch size based on varying property sizes for insert.

Also wondering the use of connection_pool, and/or maybe I should try unix_socket_path since script runs on server. maybe the use of this will allow great bath updates. I don't see documentation on this.

note: I tried, and the the redisgraph-bulk-loader.py method doesn't suit me either - as I need to constantly update and prune existing graph.

Any help / pointers appreciated!

quick and dirty example:

r = redis.Redis(host=HOST, port=PORT, db=0, socket_timeout=3000, connection_pool=None, charset='utf-8', errors='strict', unix_socket_path=None)

redis_graph = Graph('large', r)

for x in range(4000):

    label1= 'src-' + str(x)
    label2= 'dst-' + str(x)
    
    label1 = Node(label='person', properties={'name': label1, 'age': 33, 'gender': 'male', 'status': 'single'})
    redis_graph.add_node(label1)

    label2 = Node(label='person', properties={'name': label2, 'age': 33, 'gender': 'male', 'status': 'single'})
    redis_graph.add_node(label2)

    edge = Edge(label1, 'visited', label2, properties={'purpose': 'pleasure'})
    redis_graph.add_edge(edge)

redis_graph.commit()

Hi @gomesian,

This error is caused by a buffer size limit in RedisGraph's parser utility. A workaround can be found here - RedisGraph/RedisGraph#1486 (comment) . Alternately, you can create entities in a series of smaller batched queries by periodically making calls to redis_graph.flush() in your create loop.

Thanks, batching and flush() makes sense then