neo4j-contrib/neo4j_doc_manager

bulk upsert fails if oplog.timestamp is deleted

CIB opened this issue · 2 comments

CIB commented

In the official documentation, it says that deleting the timestamp should be possible. But if I start neo4j doc manager once, stop it, delete the timestamp, and start it again, I get the following error:

mongo-connector -m $MONGODB -t $NEO4JDB -d $NEO4JDOCMANAGER


 2016-07-08 07:30:53,976 [CRITICAL] mongo_connector.oplog_manager:625 - Exception during collection dump
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/mongo_connector/util.py", line 32, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/mongo_connector/doc_managers/neo4j_doc_manager.py", line 89, in bulk_upsert
    tx.commit()
  File "/usr/local/lib/python3.4/site-packages/py2neo/cypher/core.py", line 333, in commit
    return self.post(self.__commit or self.__begin_commit)
  File "/usr/local/lib/python3.4/site-packages/py2neo/cypher/core.py", line 288, in post
    raise self.error_class.hydrate(error)
py2neo.cypher.error.schema.ConstraintViolation: Node 0 already exists with label Test and property "_id"=[577ccc5e39a414a3d7d17171]

Thanks for pointing this out @CIB. Neo4j Doc Manager creates a uniqueness constraint on the _id property (the value of the ObjectID for each document), so this error is thrown because the bulk upsert is trying to create nodes that already exist. Currently bulk_upsert uses CREATE Cypher statements, but I suppose we could try changing those to MERGE and SET statements to avoid these constraint violation errors. I will try some performance tests with this to see if it makes sense. In the meantime, you could delete the data in Neo4j before restarting the doc manager to avoid this error.

Hello, I'm facing the same problem. Any news on this?
Thanks.