bulk upsert fails if oplog.timestamp is deleted
CIB opened this issue · 2 comments
In the official documentation, it says that deleting the timestamp should be possible. But if I start neo4j doc manager once, stop it, delete the timestamp, and start it again, I get the following error:
mongo-connector -m $MONGODB -t $NEO4JDB -d $NEO4JDOCMANAGER
2016-07-08 07:30:53,976 [CRITICAL] mongo_connector.oplog_manager:625 - Exception during collection dump
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/mongo_connector/util.py", line 32, in wrapped
return f(*args, **kwargs)
File "/usr/local/lib/python3.4/site-packages/mongo_connector/doc_managers/neo4j_doc_manager.py", line 89, in bulk_upsert
tx.commit()
File "/usr/local/lib/python3.4/site-packages/py2neo/cypher/core.py", line 333, in commit
return self.post(self.__commit or self.__begin_commit)
File "/usr/local/lib/python3.4/site-packages/py2neo/cypher/core.py", line 288, in post
raise self.error_class.hydrate(error)
py2neo.cypher.error.schema.ConstraintViolation: Node 0 already exists with label Test and property "_id"=[577ccc5e39a414a3d7d17171]
Thanks for pointing this out @CIB. Neo4j Doc Manager creates a uniqueness constraint on the _id
property (the value of the ObjectID for each document), so this error is thrown because the bulk upsert is trying to create nodes that already exist. Currently bulk_upsert
uses CREATE
Cypher statements, but I suppose we could try changing those to MERGE
and SET
statements to avoid these constraint violation errors. I will try some performance tests with this to see if it makes sense. In the meantime, you could delete the data in Neo4j before restarting the doc manager to avoid this error.
Hello, I'm facing the same problem. Any news on this?
Thanks.