MongoEngine/mongoengine

Exception raised: 2 or more items returned, instead of 1

alexge233 opened this issue · 3 comments

Hi, I don't think this is a bug, rather I'm stuck trying to figure this out.
I have a piece of code which interacts with my Document class below:

class NamedEntity(Document):
    word         = StringField(required=True, primary_key=True)
    score        = FloatField(required=True)
    equity       = ListField(ReferenceField(Equity))
    last_checked = DateTimeField()
    frequency    = IntField(required=True)
    validated    = BooleanField(default=False)

The code itself doesn't do anything fancy:

try:
    entity = kwargs['args']['keyword']  # keyword here is the NER
    ner = db.NamedEntity.objects.get(word=entity)
    ner.last_checked = dt.now()
    ner.validated = True
    ner.save()
    return True
except DoesNotExist as e:
    self.logger.error(f"Named Entity {entity} should exist in Database by now!")
    return False
except Exception as e:
    self.logger.error(f"Failed to Query/Save NER because {e}")
    return False

I randomly get the second exception raised. I have a trace of which arguments raised that exception. I've gone into the database, and there are no duplicates. I even wrote a unit test which iterates the arguments that have previously raised that exception, run it, and nothing happened.

My stack clearly describes the error, but I don't really get why it is raised.

The only scenario I can imagine where this might happen, is if me referencing this particular object, and then setting its attributes and calling save is not atomic and somehow parallel threads happen to call it at the exact same time.

Any help is greatly appreciated!

please provide the stacktrace and exact error that you get to have a chance to get additional help. word seems to be your primary key so getting duplicates is unlikely. It's also unclear if you get the error in .get() or when .save() is called

If the error is that you are getting duplicates, make it retrieve the objects and log them in their raw format (print(list(db.NamedEntity.objects(word=entity).as_pymongo()))

@bagerard Sadly that's all there is, I'm using sentry to capture errors, so the exception shown above is all I have at the moment.
What's worse is I can't reproduce it. I'll add the print statement you suggested within the exception caught, and see what it returns.
Are save and get atomic? (I presume they are).

I suspect it isn't save, but get that actually triggers it, at least that's what the error seems to imply.
The environment is multithreaded using with concurrent.futures.ThreadPoolExecutor and my suspicion is that there is overlap in the code execution.

I was going through this recently, as I now changed the code, and found the following from the documentation:

Documents may be updated atomically by using the update_one(), update() and modify() methods on a QuerySet or modify() and save() (with save_condition argument) on a Document. There are several different “modifiers” that you may use with these methods:

Which makes me think that my save was indeed not atomic. The source of this is that the code is called in a multi-threaded environment spun off a rabbit-mq consumer.

I've since changed it to use:

 db.NamedEntity.objects(
                    word=entity
                ).update_one(
                    set__last_checked= dt.now(),
                    inc__frequency = 1,
                    set__validated = True
                )

It's been running like this for weeks now, and I have not seen this error again.
I'm closing this, as it was my misunderstanding of what is and isn't atomic that lead to the problem.