slacy/minimongo

Better support for nested Model objects

superbobry opened this issue · 7 comments

Right now, nested data must be of a native Python type, not of another Model.

Yes. There are two options:

  1. Have embedded Model's turned into their own collections and stored internally as DBRef's
  2. Actually just embed them when embedded.

I'm thinking that #1 is the best answer, but I'm wondering if there's some use case for #2, or if the behavior should be controllable.

Hm, I don't see a way of implementing this without introducing schema declarations (which we are trying to avoid) -- can you provide an example?

For case #2, one option would be to have minimongo annotate saved objects with the db and collection (or even just the minimongo module & class name). I think this might be the most flexible approach, but I'm not really that big of a fan of it.

For case #1, it's a bit more straightforward:

Assume the root object we're saving is in a variable called 'root'. In root.save(), you would iterate through all the fields recursively If you find any field where isinstance(field ,Model) is True, then you'd replace that field with the value of field.dbref(), and then call field.save(). When you're done with them all, you call root.save(). There are some weird transactionality issues, obviously. :)

On read, it's as I described in the other issue -- if you see a fiel dwhere isinstance(field, DBRef) is True, then you translate that into a different class that automatically loads the embedded document just before the field is referenced.

oO commented

Hmm. I actually think that scenario #2 is more of a declarative problem for minimongo than anything else. Assuming that ModelB is stored inside ModelA's document, it should have some kind of Meta declaration that marks it as a SubModel of ModelA

class ModelA(Model):
    class Meta:
        collection = "model_a"

class ModelB(SubModel):
    class Meta:
        model = ModelA
        field = "submodels"

With the following mongo data structure:

{
    _id : ObjectId("...")
    name : "foo"
    submodels : [
        { _id: ObjectId("..."), name: "a"},
        { _id: ObjectId("..."), name: "b"},
        { _id: ObjectId("..."), name: "c"},
        { _id: ObjectId("..."), name: "d"}  
    ]
}

I'm assuming here that embedded objects are always embedded as per the schema design, like in the BlogEntry+BlogComments example that mongo keeps talking about.

In that example, minimongo would know how to transform the loaded data into objects, because we've declared SubModels specifically.

I actually have a structure just like that in my app, so I'm going to need to find a solution for something similar as I'm integrating minimongo into the project.

I'm not sure why you would use the structure outlined above rather than:

{
    _id : ObjectId("...")
    name : "foo"
    a: { _id: ObjectId("...")},  # This is a DBRef
    b: { _id: ObjectId("...")},
    c: { _id: ObjectId("...")},
    d: { _id: ObjectId("...")},
}

Of course, if we're talking about an array of comments, then yeah, the structure you outlined would be sufficient, but I'm not sure what "name" really signifies in that case.

Another piece of food for thought: We could easily create a mapping from (db,collection) -> Type such that given any DBRef, we would know the possible types for it. I don't want to restrict "two types in the same collection" so there need to be a way around that, but I think that restriction can be up to the application designer.

Have you looked at the code that does field transformations? The other thought is that you could just transform the field at the point you save & load it. The transformation would be from 1st class objects into DBRef's and back again. Ideally, there would be a "lazy DBRef" object for the read case.

oO commented

I guess it's the main difference between using embedded subdocuments that more often than not are accessed and searched on based on attributes of their parent document, vs using related documents. The first pattern is the embedded subdocument one, and the one you propose is the related document.

Also for a schemaless design, I would expect the user to know more about what they expect. The code below definitely means two completely different things, so:

foo.bar = bar
foo.save()

foo.bar = bar.dbref()
foo.save()

knowing the difference between the two is something you would need a schema for. I would expect that if I had written the first form, I would actually get the whole bar document being stored inside the bar property of the foo document, and not a reference.

As for your other question, name just represented the other, unique properties of the master and embedded subdocuments. also unless A,B,C,D are known slots for different relations, I would still end up with an array of DBRefs instead.

As far as the DBRef to Document dereferencing, I can see a few approaches.

  1. embed the document classname into the DBRef itself this is probably the most flexible and easy to implement approach, but it means modifying the data to support the ORM, which I tend to frown upon.
  2. register how multiple Document classes are differentiated (which is what SQLAlchemy and others do). Basically telling the system that documents in the people collection with type=='manager' map to the Manager class and type=='employee' map to the Employee class. This could of course be a much more complicated rule.
  3. Register the field containing the DBRef which is what you've done with the Field Mapping feature. This doesn't work for the case where you might have an Array of DBRef, and each DBRef may represent different document types or come from different databases/collection.

A quick comment: Sometimes case #2 - actually embedding the object within the parent - is necessary. If you want atomic updates of both parent and child, they have to live under the same key.