jazzband/django-model-utils

Undocument breaking changes in FieldTracker in Version 4.1

Closed this issue · 10 comments

Problem

The Release 4.1, after moving the patch from the save() to the save_base() method in #404 (@tumb1er FYI) introduced a serioulsy breaking change that was neither announced nor documented (and was obviously not tested brefore). It would have deserved a 5.x update IMHO.

Since the values of the tracker are restored after the call to the save_base() method, any check/assumption made on the tracker inside the save() Method but after the super().save() call fails, as the tracker is then "new/empty". This is especially problematic when tracking changes depending on related objects, when working with nested serializers per example.

Environment

  • Django Model Utils version: <4.1 VS 4.1+

Code examples

class MyModel(Model):
        name = models.CharField(max_length=64)
        tracker = FieldTracker()

        def save(self, *args, **kwargs):
            if self.tracker.changed('name'):
                 print("True")
            super().save(*args, **kwargs)
            if self.tracker.changed('name'):
                 print("True")

Just calling MyModel.create() would result in 2 prints in version <4.1 and in only one in version 4.1+ (the second check will fail as the tracker is resetted on the save_base() call inside the `super().save() call)

Discussion / proposal

@auvipy A work-around is to test before the call to save() and then use the variable afterwards for such cases.

But I honestly find it quite a dramatic regression for the IMHO limited benefit of being able to modify update_fields in the save() Method.

I would seriously prefer to restore the previous behaviour of patching the save() Method, but still add a functionality to support overriding update_fields: I would prefer to add a new kwarg _always_update to FieldTracker where you could define the list of fields that you always want to update. I think also in term of design and readbility it would be a better feature. @tumb1er what do you think about the suggestion? If you validate it, I could draft a PR for this

Hello @MRigal, thank's for calling.

First, your solution with _always_update does not solve problem with dynamic update_fields list (i.e. something like "save only changed" scenario).
Second, it's the definition of "changed fields": what is it? I think, changed fields are fields that differ from database version of an object. When you physically save your object to database, changed fields are not changed anymore by this definition.

When working on FieldsTracker, you should also consider abstract parent models, polymorphic models, model mixins and nested save() overrides.

One more argument for save_base() is that django signals are raised from that method.

I agree that such breaking changes should be placed to major release, but now we have what we have. Now we can consider two definitions of "changed": that one above and any another. If we don't change the definition, current tracker behavior is correct (but maybe we should add another field set changed_and_than_saved, that will not be invalidated at all). If we change definition, complete tracker behavior should be revised.

@tumb1er thanks for replying! As it is sometimes easier to speak around code, I've created a PR with a basic test, that would work on django-model-utils <4.1 and fail on 4.1+, as well as separately, a change proposal to keep the nice work you've added when using update_fields.

I think it shows how it is possible to work with a dynamic update_fields list. Or maybe you could elaborate on that point.

I agree that what matters is the definition of changed field, but this is where I disagree with you. I don't think it should be the exact moment where the save_base() returns but in the opposite, the moment where save() returns. If "physically" I might agree with you (although it happens technically a little bit before the return, we would already be "late"), from the perspective of the django concepts, I don't, as it is pretty common to continue working on related models inside the save() method, after the call to super().save(). And at this point in time, I would expect the FieldTracker to not have call set_saved_fields() before the end of the save() Method. And this even more with abstract parent models, polymorphic models, model mixins and nested save() overrides.

I think we might agree that we disagree here, but:

  1. IMHO the behavior that has been used for years pre-4.1 should be restored ASAP, #492 being a basis to get a 4.1.2 out of the door with all good additions of the 4.1 series
  2. A real definition of the "validity period" of the FieldTracker should be agreed on, and precisely documented maybe with insights from previous committers @jezdez @carljm @treyhunner @lucaswiman @st4lk
  3. If it is desirable to switch to your understanding of the FieldTracker, prepare a 5.0 release with documentation about the breaking changes

@MRigal yes, I disagree with almost all your statements.

  1. If we revert behavior ASAP, another person could provide same arguments to revert a revert.
  2. Before "validity period" and "changed definition" are given, nothing should be changed.
  3. It will be "stupid" to revert behavior and than revert it again in 5.0.

Another proposal is to tag 4.1 as 5.0 and bypass step 3 (adding breaking changes to documentation).

And I think there should be third party person as a decision maker.

Hi @tumb1er,
as a disclaimer in advance, I would like to mention that I work on the same codebases as @MRigal and have therefore similar motivations and perspectives on the matter.

My reason for speaking out here nevertheless is that just today, I worked on another piece of code where the exact topic of this issue turned out to be relevant: The save() method of a Django model features logic that makes use of a field tracker but must be executed after super().save() because the instance is required to have an ID at that time. The logic itself is complex enough that it feels reasonable to encapsulate it in a dedicated method. Prior to version 4.1, this could look as simple as this:

class MyModel(Model):
    name = models.CharField(max_length=64)
    tracker = FieldTracker()

    def my_post_super_save_logic(self):
        if self.tracker.changed('name'):
            # Do complex stuff
        else
            # Do other complex stuff

    def save(self, *args, **kwargs):
        super().save(*args, **kwargs)
        self.my_post_super_save_logic()

This was my first approach to a post-4.1 solution:

class MyModel(Model):
    name = models.CharField(max_length=64)
    tracker = FieldTracker()

    def my_post_super_save_logic(self, name_has_changed):
        if name_has_changed:
            # Do complex stuff
        else
            # Do other complex stuff

    def save(self, *args, **kwargs):
        name_has_changed = self.tracker.changed('name')
        super().save(*args, **kwargs)
        self.my_post_super_save_logic(name_has_changed=name_has_changed)

As you can see, django-model-utils’ new tracker behaviour forces the implementation to either distribute the method’s logic across different parts of the the save() method or add an interface to the method that was unnecessary before. Also note that the method’s code might need to access tracker information for several fields, which would make the interface even more complex.

Since I found this unsatisfactory, I came up with another idea and implemented the method as a context manager:

from contextlib import contextmanager

class MyModel(Model):
    name = models.CharField(max_length=64)
    tracker = FieldTracker()

    @contextmanager
    def my_post_super_save_logic(self):
        name_has_changed = self.tracker.changed('name')
        yield
        if name_has_changed:
            # Do complex stuff
        else
            # Do other complex stuff

    def save(self, *args, **kwargs):
        with self.my_post_super_save_logic():
            super().save(*args, **kwargs)

This solution has two advantages: It makes the save() method straight-forward and clean and it re-encapsulates the post-save logic in a single method.

While this is viable to us, there remains the drawback of increased complexity in the code that was not necessary before. Therefore, if there is a significant group of users who take advantage of the tracker being reset in super().save(), we would very much appreciate if django-model-utils could facilitate both workflows by providing interfaces for either “validity period” – the one ending with super().save() and the one not ending until the save() method itself terminates.

Thank you very much for considering our perspective.

Hello! Interesting idea with context manager.
What if tracker will reset it's state at outermost context exit?

    # monkey-patched MyModel.save_base
    def save_base(self):
        with self.tracker:
             super().save_base()
        # reset here

    # MyModel.save
    def save(self):
        with self.tracker:
            super().save()  
            # not reset here!
         # reset here

I think we can replace django-model-utils internal fields reset code with a context manager, and allow users to use it as in "outermost-reset" manner.

  • Default 4.1 behavior will not change
  • But you can mark the place where changes are reset with nested context manager.
  • Decorator can also be added to remove with clause in save for example

Your logic will look like this:

@reset_fields(tracker='tracker')
def save(self):
    super().save()
    if self.tracker.changed('name'):
        self.my_post_super_save_logic()

@gregorgaertner if you agree, let's decide who will make a PR for this.

@tumb1er we discussed offline with @gregorgaertner and we do agree that this is a great idea!

If you're willing to, you could start with a PR and we could both review it. Else I was thinking to try to work on it, but I'm not sure when I'll get the time for it.

I'll try to find some time at weekend.

@MRigal @gregorgaertner Hello! Review please: #494