SierraSoftworks/Iridium

Important Idea/Feature: Better Nested Object Support

Opened this issue · 11 comments

Hi Benjamin,

I have been doing some thinking lately about how Iridium requires you to reassign a nested object in order to get it to be saved. I have come to the conclusion that the forced reassignment is not desired. I think Iridium should treat nested objects as a normal thing. Right now it kind of feels like nested objects are treated as, for a lack of a better term, 'second class citizens'.

Here is what I'm talking about (snippet taken from Transform section of main docs):

export class InstanceType extends Iridium.Instance<any, InstanceType> {
    // Converts a GeoJSON object into a simple {lat, lng} object and back.
    @Iridium.Transform(
        data => { lat: data.coordinates[1], lng: data.coordinates[0] },
        data => { type: "Point", coordinates: [data.lng, data.lat] }
    )
    position: {
        lat: number;
        lng: number;
    };
}

db.Model.findOne().then(instance => {
    console.log(inspect(instance.position)); // { lat: 1, lng: 2 }
    console.log(inspect(instance.document.position)); // { type: "Point", coordinates: [2, 1] }

    let pos = instance.pos;
    pos.lat = 3;
    console.log(inspect(pos)); // { lat: 3, lng: 2 }
    console.log(inspect(instance.position)); // { lat: 1, lng: 2 }
    console.log(inspect(instance.document.position)); // { type: "Point", coordinates: [2, 1] }

    instance.position = pos
    console.log(inspect(instance.position)); // { lat: 3, lng: 2 }
    console.log(inspect(instance.document.position)); // { type: "Point", coordinates: [2, 3] }
});

Notice how

    let pos = instance.pos;
    pos.lat = 3;
    console.log(inspect(pos)); // { lat: 3, lng: 2 }
    console.log(inspect(instance.position)); // { lat: 1, lng: 2 }

the instance.position didn't change when you assigned pos.lat = 3?

I believe that modifying pos.lat should be reflected the next time I do instance.save().
Well this is a slightly odd example because you are doing let pos = instance.pos; and modifying pos.lat.
But if I modify instance.pos.lat (which is more of the use case that I see being normal usage) the same thing happens; that is if I modify instance.pos.lat and then call instance.save(), then the modification I did still won't be saved.

My philosophy is that if I retrieve an object from the database and modify any part of it, then it should be saved back to the database.

A good example is if I have a User collection which has a license field which is an object with various fields. I want to declare my license field as a License class that I define. So when I get a User from the database I want it to do a transform from the database and do what is necessary to create the new License(...) and make the license field of my User an actual License object. And then I want to do user.license.states.push('Nebraska'); // License has a states property which is an array of strings followed by user.save() and I expect 'Nebraska' to be saved to the database.

So I really think Iridium should be updated to support this.
Now I saw that the documentation says:

It is important to note that property transforms are lazily evaluated on field access, rather than when the document is retrieved from the database. This is done for performance reasons, but has the side effect that complex objects which are the targets of property transforms must be re-assigned to the field if you wish to trigger the toDB transform function.

But I think transforming everything once when the document is retrieved from the database and then once again when the instance is being saved to the database isn't a performance hit. Right now any time a property is accessed that has a fromDb transform, the transform is executed. And any time a property that has a toDb transform is assigned, the transform is executed. So it seems like transforms are executed more often than what I am proposing.

Even if I am wrong, that is even if there is a huge performance hit if you don't defer the transform until something wants it, then there is still a solution to defer execution of the transform. It is possible to implement it so that the transform is only executed once on the first time the property is accessed AND to have it store the result of the transform AND have any future accesses to that property return the stored result. Then when it comes time to save back to the database, the toDb transform is ran once on the stored object and the result of the toDb transform is used as the document to save. However this way is more complicated to implement than my other idea of transforming it once when it comes from the DB and executing the the toDb transform once when it is about to be saved to the database.

I think Iridium should handle nested objects. One idea is to have the Iridium API look for a toIridium() or a toDocument() or a toDbDocument() method on nested objects and if that is present, call that method when trying to save to the database and use its result as the _modified document to try saving. Then that allows Classes to be defined that implement that method. So my License class could implement that method and then Iridium would now how to save it. Of course you will probably still want to support the toDb transform. So I would think the precedence would go:

  1. If toDbDocument() exists on the object use it.
  2. else, if toDb transform exists for that property, then use it.
  3. else the property must be a primitive so no transforming is needed, and leave it.

So right now I am curious to know what your thoughts are on this matter. Again I think this is a very important feature that Iridium lacks. Then we can go from there once I hear your thoughts.

Hi @CatGuardian,

Thanks for the really well documented feature request, it's definitely something that has stood out as a sore point in using Iridium ever since the initial transform framework was added and something I've been wanting to solve for a while (but alas, one never has as much time as they think they do).

Let me start by laying out some of the edge cases that creep in when dealing with transforms and then how the current transform framework is implemented, after which I'll touch on your design proposal and we can discuss what feature priorities look like for a redesign of this aspect of Iridium.

Requirements

  1. Must work with all un-annotated types by default (using transforms should be optional and if you don't want to use them, you shouldn't need to write any extra code just because they exist).
  2. Must allow users to define their own transform functions for arbitrary fields ({ x: number; y: number } should allow you to have different transforms for x and y).
  3. Should be controlled via a decorator (this keeps its behaviour in-line with the rest of Iridium's "add-on functionality API").
  4. Should be controllable via a static field on the model class for use cases which do not enable decorators (becoming more and more rare by the day, but it keeps the core API pattern standardized).
  5. Should avoid transforming fromDB for fields which are not used (lazy evaluation).
  6. Should be intuitive to use in conjunction with the instance.save() diff algorithm (making changes to an object should update that object in the DB in a predictable way).
  7. Should avoid mutating map objects to add virtual properties, or creating new classes to minimize the risk of V8 de-optimizing code paths that use those objects.

Edge Cases/Gotchas

(And a sprinkling of info about the current implementation)

  1. The need to support different transforms for the same data types means you need to keep a separate mapping of transforms to fields (otherwise you could just use toDB() on objects and be done with it - as you suggest).
  2. If you use toDB() extensively then you need to wrap objects in something that adds the toDB() method to them. For example: your fromDB() transform might need to data.map(item => new CustomItem(item)). This is pretty costly from a performance perspective especially if the majority of those items will remain untouched and doubly-so if this is done pre-emptively. Essentially you're implementing a subset of the Instance class for each property (and there's a bit of wizardry that goes on to try and make that class perform as quickly as possible by "pre-compiling" it).
  3. The diff algorithm works on raw DB objects (because trying to apply the transforms to a changeset is significantly more complex to get right than just diffing the two transformed docs). This means that before we compute the diff for instance.save() we need to have both the original and newly modified documents at hand. That's not a major issue if you're working with "out-of-DB" representations natively - you just need to transform everything back toDB before you diff it.
  4. Transforms should behave in a predictable manner when you're not using an Instance (i.e. when using model.create() and model.insert(). This gets even more interesting when you start talking about using upsert (since you're then talking about trying to transform a changeset object correctly).

Current Implementation

The current transform implementation has two distinct components: $document and [field: string]: .... The $document transform is always eagerly evaluated and allows you to change the top-level structure of the document (adding/removing/renaming fields etc), while the [field: string] transforms are applied lazily. During create()/insert() they are applied before the document is sent to the DB, while when you're working with an Instance they are instead handled by the get and set methods for the instance field.

This is what you're seeing with complex properties, where changing a nested element doesn't trigger the transform. This results in a disconnect between the "disconnected" transformed object (which you made the changes to) and the .document which still holds the pre-transformed object (unmodified).

It can also get worse in situations where the transform does a shallow-copy but does deep-transforms. Let's take the following example:

interface Doc {
  items: {
    name: string
    upper?: bool
  }[]
}

function toDB(data: Doc) {
  return data.map(item => { name: item.upper ? item.name.toUpperCase() : item.name.toLowerCase() })
}

function fromDB(data: Doc) {
  return data.map(item => item)
}

In this scenario, the following code would result in a TitleCase value being inserted - even though the transform seems to imply that could not happen.

instance.document // { items: [{ name: "lowercase" }] }

const item = instance.items[0]
item.upper = true
item.name = "TitleCase"

instance.document // { items: [{ name: "TitleCase", upper: true }] }
// EXPECTED:
instance.document // { items: [{ name: "TITLECASE" }] }

So it's obvious this doesn't work particularly intuitively. The solution I've applied is to always tread the values of instance fields as immutable (replace the whole thing or don't change it) - however that obviously isn't enforced, which causes the issues you're seeing.

instance.document // { items: [{ name: "lowercase" }] }
instance.items = [{ name: "TitleCase", upper: true }, instance.items.slice(1)...]
instance.document // { items: [{ name: "TITLECASE" }] }

Your Design Proposal

Okay, so now that I've done a brain dump of the current state of things and some of the issues that need to be solved by the transforms framework, let's talk about your proposal. First of all, thanks for putting in so much effort - it looks great and it's clear you've thought through a number of use cases and problems that one can encounter.

To start off, I agree that Iridium needs to do a better job of handling complex fields. They're a really common use case with MongoDB and not handling them nicely is akin to saying "Well, just use Redis". Ideally we'd want to avoid introducing any other edge-cases and gotchas if possible while still solving that problem, but depending on what those gotchas are, they may well be worth it.

Your suggestion of having a toDBDocument() method which defers to the toDB() property transform and finally does no transformation at all actually lines up almost exactly with the way the current system works (albeit with a different API for configuring them). The $document transform in the current system is always applied and, if you wanted to, could do exactly what the toDBDocument() method you're proposing would do - you could handle the full transform of your document there and, since it's eagerly evaluated, it would bypass all the lazy field transforms and solve the complex update problem for you.

Unfortunately, it's also pretty monolithic and your $document transform function can quickly turn into an unmaintainable monster - which is where the [field: string] transforms come in - they're there to help you out for the simpler stuff like converting a base64 string into a Buffer, or an ObjectID into a string. Unfortunately for us, they're also really nice and easy to use, so it makes sense that you'd apply them to complex objects and...wait, why isn't this working now?

Solving that problem I think is the crux of this issue. From an API perspective we'd like to be able to treat complex field modifications the same way we would treat simple field modifications (I make a change to it, it gets transformed and saved).

The current blocker on that working as expected is the fact that we only evaluate the field transforms when the field is assigned to, or read from. That works great if the field will always be assigned to when the value is mutated, but that's not the case here. Behind the scenes, we need the field assignment to occur so that this line gets executed and the underlying object gets mutated - however that's an implementation detail and one we can change (read: fix) to ensure it behaves as expected.

To achieve that, we could look at leveraging your suggestion of memoization to keep track of a "modified fromDB object" until it is re-assigned and then, on read/save apply that memoized object (through the toDB transform) to update the underlying _modified document. That would ensure that changes made to the document would be propagated correctly and has the added bonus of not really changing the external API at all.

const items = instance.items // `items` is memoized in `instance._memoize["items"]`
items[0].upper = true
items[0].name = "TitleCase"

const newItems = instance.items // returns the memoized `items`

instance.document // When this is read we re-apply all the `instance._memoize` entries before returning it, ensuring it is up to date with the changes

instance.save() // When this is called we re-apply the `instance._memoize` entries as well

This approach also has the nice benefit of introducing memoization as a performance improvement for reading those complex properties multiple times, however it would bring with it an added memory cost - so it might be necessary to provide an option to disable it for users where that is of greater concern.

What do you think? Can you see any issues that would prevent this from working the way you're expecting, or why it might not behave correctly under certain use cases?

Hi Benjamin,

Wow that was a huge response. First thanks for the time it took to put all that together.

I think your 'memoize' solution would work.

I hadn't considered what other variables would need to be kept updated as the nested object changes. Like I didn't consider that instance.document was a wrapper for _modified and so instance.document needs to reflect what would be saved to the database.

In your solution then, when a property with a fromDb transform is read (instance.propertyWithFromDbTransform) it will look like this:

// pseduo code
if accessed property is in _memoize,
    return _memoize[property]
else
   _memoize[property] = apply fromDb transform to property of original document
   return _memoize[property]

And when a property with a toDb transform is assigned, it would just assign it to _memoize[property] since the value which is set will already be in the fromDb form. That is it will already be in the format that we the programmers want it (not the format that is in the actual database).

So now is a good time to mention how I want to use the interface that defines the document. Take the below for example.

enum Colour {
    Red = 'Red',
    Blue = 'Blue',
}

// Notice how 'colour' is an instance of the class Colour
class Car {
    make: string;
    model: string;
    colour: Colour;

    public someUsefulMethod() {
        // Does useful stuff.
    }
}

// Notice how 'cars' is an array of Car objects.
interface HouseDocument {
    _id?: string;
    name: string;

    cars?: Car[];
}

@Index({ name: 1 })
@Collection('houses')
class House extends Instance<HouseDocument, House> implements HouseDocument {
    @ObjectID _id: string;
    @Property(/^.+$/)
    name: string;

    // I am assuming the @Property will be written in terms of the actual data that will be saved to the database. Meaning the result of a 'toDb' transform. 
    @Property([{
        make: String,
        model: String,
        colour: {            // Notice how this is how I want to store 'colour' in the database. My transform will take care of going between 'enum Colour' and '{ r: Number, g: Number, b: Number }'
            r: Number,
            g: Number,
            b: Number
        }
    }])
   // TODO: @Transform to make the layout from the database into an array of `new Car()` and turn it from an actual Car object array into the data that is saved to the database.
    cars: Car[];
}

Notice how the interface HouseDocument isn't a definition of the structure of the document in the database? But rather a definition of what I expect the fromDb to spit out. So houseInstance.cars[0].colour would be of enum type Colour not an object of { r: Number, g: Number, b: Number }. But houseInstance.document.cars[0].colour would be an object of type { r: Number, g: Number, b: Number }. This is a really confusing concept because you declared interface HouseDocument but its really interface HouseData // Where HouseData is the stuff that the 'fromDb' transform returns. From your instance.items example I think we are on the same page as how what the HouseDocument interface really is defining (because your Doc defined an extra property upper but from the transforms, only the name is actually being saved to the database).

That was a slight tangent but I wanted to make sure we are using the interface HouseDocument the same way.

Back to your memoize idea.
Then if we access instance.document OR initiate a save it would do this

// pseudo code for getting the actual format that the database needs and that is needed to compare to the original database document for the diff algorithm

const modifiedDocument = {}
for each property that is supposed to be saved to the database
{
   let dbValue;
   if (_memoize[nameOfPropertyToBeSaved]) 
        dbValue = transform _memoize[nameOfPropertyToBeSaved] using 'toDb' for nameOfPropertyToBeSaved
   else
       dbValue = modified value for the nameOfPropertyToBeSaved (_modified[nameOfPropertyToBeSaved ??)   // This should work because if there is no transform then any modifications to the property would be made directly to '_modified'. And if 'instance[nameOfPropertyToBeSaved]' is never accessed (which it is required to access something to change it. and if 'instance[nameOfPropertyToBeSaved] = stuff' is performed, then it would already be in '_memoize[nameOfPropertyToBeSaved]'

   modifiedDocument[nameOfPropertyToBeSaved] = dbValue
}

return modifiedDocument;

I think I am liking this idea you proposed.

Please respond with any comments you have.
And please respond with where do we go from now.

Also I have a question.

How does Iridium know which properties of my class it should save to the database?

For example:

interface HouseDocument {
    _id?: string;
    name: string;
}

@Index({ name: 1 })
@Collection('houses')
class House extends Instance<HouseDocument, House> implements HouseDocument {
    @ObjectID _id: string;

    @Property(/^.+$/)
    name: string;

    otherProperty: number; // Will this property get saved to the database if I update it?
}

I think in order for a property to be saved to the db, it has to be defined in the model's schema right? So since I didn't use @Property on the otherProperty, then otherProperty won't be saved to the database?

Hi @CatGuardian, you seem to have exactly understood the idea - all your pseudocode is spot on and the use cases you're describing align with what I'm expecting.

The only slight change I'd likely make is that after updating _modified with the new values from _memoize, I'd probably clear _memoize. The reason for this is that there are use cases where someone may wish to mutate an object when it is saved to the DB as part of the transform (a last modified timestamp, for example) and we wouldn't want the accessor to return the wrong value after that modification. This would just mean that after accessing .document or calling .save (or anything that changes document) you would need to re-build the memoization table (at a small performance cost), but I think it's worth it for the added predictability.

As an aside, your idea of using the class Car should work without any issues, even if you want to use a toJSON() or toDB() method defined on the class. What you could do to make that as easy as possible is create a new decorator like this:

function ClassTransform<T extends { toDB(): any; }>(cls: typeof T) {
  return Iridium.Transform((value: any) => new cls(value), (value: T) => value.toDB())
}

function ClassListTransform<T extends { toDB(): any; }[]>(cls: typeof T) {
  return Iridium.Transform((value: any[]) => value.map(v => new cls(v)), (value: T[]) => value.map(v => v.toDB()))
}

This would then let you define your model this way:

@Index({ name: 1 })
@Collection('houses')
class House extends Instance<HouseDocument, House> implements HouseDocument {
    @ObjectID _id: string;
    @Property(/^.+$/)
    name: string;
   
    @Property(CarSchema)
    @ClassListTransform(Car)
    cars: Car[];
}

As for how Iridium knows which properties to save - anything on document will be stored. The way that the instance's virtual class is built means that (right now, and for performance reasons) only properties defined on your schema will be propagated from your class to document, however you could happily write directly to document before calling save and it will work as well (though you would need to access those properties via document as well).

As for the next steps, I think this is likely a pretty small implementation so I'll try and set aside time to look at it this week - however I'm afraid (as I'm sure you've noticed) I've been absolutely swamped with other stuff recently and the amount of time I've had to dedicate to Open Source has been almost nil. I'll try and keep you updated if there are any delays and, as always, if you'd like to take a bash at implementing this then please go right ahead - I'll do everything I can to assist.

Hi @CatGuardian, I've just pushed out v8.0.0-alpha.13 which includes an implementation of the approach we discussed (as well as a bunch of other refactorings and improvements which have been long overdue). Can you please look at the release notes and try it out to let me know whether this addresses your requirements?

Thanks

Hi Benjamin,

That is awesome that you went ahead and did this. Thank you a bunch!

Here are the things I want to do but I don't have time to do this weekend:

  • Review your changes and see what it looks like
  • Try this new feature myself. I unfortunately am a think ahead kind of person and didn't have any immediate need for this feature. Sorry, I hope I wasn't misleading. I didn't mean to make this request sound urgent. My intention with my 'Important' word in the title was to indicate that its something important to have but not necessarily urgent. So I will try this out soon even if its just for testing purposes of how I hoped it would work.

Also as a side note, I would be way more than happy to do pull request reviews on stuff before they are merged in if you wanted someone to look at it before hand. For this particular feature I think that would have been beneficial because I think I spotted something we didn't consider. Which brings me to the real reason I logged in to GitHub today: I don't think we properly considered the Instance.document property.

Like I mentioned in my task list, I haven't had time to review your changes so it might have been noticed and resolved but I wanted to bring it up to make sure. The problem I am seeing (before your changes) is that the Instance.document property is of type TDocument which is the data interface that we define. Let me show you (below taken from our my previous comment):

// Notice how 'cars' is an array of Car objects.
interface HouseDocument {
    _id?: string;
    name: string;

    cars?: Car[];
}

@Index({ name: 1 })
@Collection('houses')
class House extends Instance<HouseDocument, House> implements HouseDocument {
    @ObjectID _id: string;
    @Property(/^.+$/)
    name: string;

    @Property([{
        make: String,
        model: String,
        colour: {
            r: Number,
            g: Number,
            b: Number
        }
    }])
    cars: Car[];
}

Since my HouseDocument interface has a cars?: Car[], then my houseInstance.document.cars has type of Car[]. But from our discussion about calculating the document property by using toDb transforms, houseInstance.document.cars will actually return a javascript object that is this type:

{
        make: string,
        model: string,
        colour: {
            r: number,
            g: number,
            b: number
        }
}[]

But typescript is telling me that houseInstance.document.cars is of type Car[].

Again, this was before your changes and I have not reviewed your changes so I don't know if u fixed this or not.

But if you didn't fix it then we have to do 1 of 2 things:

  1. I am wrong about how to use interface HouseDocument and I am really supposed to define the actual look of the data that is stored in the database and not how the data looks after being transformed with the fromDb transforms. If this is the case and I am wrong, then we should update the main page documentation and take out the extends HouseDocument declaration of House because that is confusing and misleading as it seems to indicate that is the correct way to do it, but in reality the real way to do it is to just declare the transformed properties on the class. Additionally we should make a comment on the main docs page above the HouseDocument saying that it is to be the actual representation of the database. That provides clear instructions.
  2. I have been using the interface HouseDocument correctly. That is, I should define the HouseDocument in terms of after it has ran through all of the fromDb transforms. In this case, then when the Instance.document is read, we could do this in the getter: return this. Because if House implements HouseDocument then any property accessed of Instance.document should be the same thing as accessing Instance. For example Instance.document.someProperty is the same thing as Instance.someProperty in the case that House implements HouseDocument. We should then also add some description to the main page documentation saying that it is expected that your Instance classes implement your Document interface. i.e. it is expected that House implements HouseDocument

So what are your comments about that?

I haven't looked at your commits yet but I just wanted to jot done some potential TODO's from this feature. I don't expect you to necessarily do these. I can help too. I just don't want to forget these things.

  • Write a new 'Nested Documents' section which shows off this new feature using an example with classes.
  • Update the Transforms documentation section to reflect this change.

I might add more to this comment via edit later if I think of more.

No rush, there's still a bunch more stuff I want to address before 8.x gets released as a beta or stable release anyway.

As for the behaviour of TDocument, that has indeed "changed" since <=8.0.0-alpha.12. More specifically, it has now been defined as "the DB document before any property transforms are applied to it" (note that this is after any renames and the $document transform). The big change that this imposes is that if you are making use of field transforms (where the data in your DB is of a different type to the data in your app) then your TInstance will not implement TDocument (which is exactly what you're seeing now).

Traditionally TDocument served both the pre-transform and post-transform purpose depending on where it was used (.insert() and .create() treated it as post-transform (so toDB() would be called on its fields) while the Instance treated it as pre-transform (it returned ._modified which was not yet transformed). This obviously wasn't great at all but I don't know if anybody ran into the issue (nobody complained yet). Rather than leaving this in its previous (broken) state, I've decided to give it a strict definition and start providing tooling and docs that clarify that to avoid any confusion.

The first step of that has been updating the README with a new set of examples which show this distinction, as well as the release notes for v8.0.0-alpha.13 which detail the changes and what needs to happen for users upgrading from older versions. More docs will definitely be incoming as well as additional examples and probably some improvements to the website to help make navigating this content a bit easier. All of that will, of course, take a bit of time to complete though - so bear with me 😄.

So the TL;DR is that you weren't really doing things incorrectly before (since there wasn't a well defined correct/incorrect way to do things) but with the latest version the approach of using TDocument as a "post-transform" representation is no longer valid and your TInstance should no longer implements TDocument unless you're not changing the types of the fields.

Ok I agree as I was looking through the source code again for a different bug (#119). And it makes the best sense for TDocument to look like the actual database representation.

So I will make sure I do that from now on.

Thanks for the responses. I'll post again after I get around to looking at the changes.

Perfect 👍