RamiAwar/Mongomantic

Preserve extra fields in stored documents

hdantas opened this issue · 8 comments

Checklist

  • I've searched the project's issues.

❓ Question

Is it possible to update individual fields instead of overwriting the whole document?

With mongoengine you can do something like

# UserME is a mongoengine Document class
user = UserME.objects.get(id=1)
user.first_name = "John"
user.last_name = "Smith"
user.save()

and the ORM will be smart enough to understand you only need to update these two fields.

Skimming through the code it seems the way to do this with mongomantic would be

# UserMM is a mongomantic MongoDBModel class
user = UserMM.objects.get(id=1)
user.first_name = "John"
user.last_name = "Smith"
user = UserRepository.save(user)

The main concern I have is that it looks like the repository is creating a new document and overwriting the existing one. This can potentially be dangerous as any fields not in the model would disappear, and if you have multiple concurrent write operations to the DB only the last one would "stick".

📎 Additional context

I've been working with a pattern surprisingly similar to what you're building and came across the issue I mentioned above on my own code. So I started researching if other's have found a solution and that's when I came across this repo. My thinking is we need some kind of stack to store the changes until the save() call. But I haven't found a decent way to do this with pydantic.

Hello @hdantas, thank you for your interest in our work!

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

Hey @hdantas , first of all thanks for being Mongomantic's first user 😄

Thinking about this quickly, I think that you're right. I'll write some test for this later today to confirm.

If this is the case, we'd have to implement something a little smarter that would update a document if it exists. I usually use mongoengine for inspiration when writing such functions, here is their save for example Mongoengine Save.

Will look into this further if you don't beat me to it!

@RamiAwar thanks for getting back to me. I'm not actually using Mongomantic just yet, I was trying to see if others had solved this problem.

If this is the case, we'd have to implement something a little smarter that would update a document if it exists. I usually use mongoengine for inspiration when writing such functions, here is their save for example Mongoengine Save.

I'm playing with the simplest solution which is to modify __setattr__ in what we call the domain model (you call it MongoDBModel) to store in a private field all the changes to the object. Then the repository save method can inspect that list, persist those fields through mongoengine, and then reset the list. This is a bit brittle because you might miss changes to mutable objects (similar to this example from pydantic).

The more robust alternative is to follow the mongonengiene approach as you suggested but then you have to figure out which fields have changed using something like this which is much more complex.

I'm still trying to figure out which one I'm going to follow. Once I have found something I'm happy with I might port it to this repo.

Will write some test for this and get back to you!

@hdantas Let me try to rephrase the problem and ask a question:

user = UserRepository.get(...)  <----   Lose data here in case document structure doesn't match pydantic model?
user.first_name = "something else"
UserRepository.save(user)
  • You're saying in case we have extra data stored in MongoDB, then this approach would just drop that data?

@RamiAwar yes that's basically it.

@hdantas Ah okay. Well I designed mongomantic to be as simple as possible assuming that it would be the only interface to the DB. Hence, I wouldn't care about handling extra fields.

I believe its not too difficult to extend this and make it read the current value with the extra fields (check if exists), update dict, then save that (on each save). Pydantic supports extra fields out of the box.

Fair point. There's also the scenario where you have multiple concurrent API requests that affect the same Mongo document. If you're always overwriting the whole document then you're basically guaranteed to have data issues, while if you just modified the intended fields you'd reduce the risk of conflicts.

Continuing to use the "John Smith" example for above. Let's say you get two concurrent patch requests

  1. Modify the first name to "Aby"
  2. Modify the last name to "Carson"

With the current implementation, you are quite likely going to have one change winning over the other. So you will either end up with "Aby Smith" or "John Carson" depending on which one executes last while if you'd only modified the intended field you would end up with "Aby Carson".

Obviously, if you get two concurrent patch requests to modify the last name then whichever is the last wins, and that's an issue with the client, not the server. I understand this might be an edge case you don't think it's worth addressing but this is the issue that I'm having which made me look online to try to find an answer for.