jazzband/django-model-utils

FieldTracker massive performance issue with Django 3.1+ on FieldField

MRigal opened this issue · 4 comments

Problem

When I updated a project to Django 3.2, I've noticed a massive performance lost. The problem comes from this code change which appeared in 3.1, where __getstate__() now also copies/saves the instance: django/django#12055

Environment

  • Django Model Utils version: latest, but all problematic
  • Django version: Fine <3.1, problematic after
  • Python version: all
  • Other libraries used, if any:

Code examples

As the functionality works, it is not easy to describe. But the fix is relatively easy. In FieldInstanceTracker.get_field_value(), replace:

return getattr(self.instance, self.field_map[field])

with

field_value = getattr(self.instance, self.field_map[field])
        # TODO: or isinstance(field_value, FieldFile):
        if hasattr(field_value, 'instance'):
            field_value.instance = None
        return field_value

I'm preparing a PR for that. @tumb1er do you an opinion on which strategy to adopt? I guess it is fine to always discard instance, considering that anyone creating own subclasses of fields would probably not define an instance attribute which is different from the instance owning the field (which is not interesting us)

I think it's more like ForeignKey instance. Instead of fetching model object from db, tracker uses foreign key value. Here It may be file name (that's what stored in db).

No the problem is very much different. Check the django PR linked at the top.

Since 3.1, getstate() (used by deepcopy), will also copy the full intance that holds the field (and this for each field). The memory footprint of this methods can easily be multiplied by 100 or 1000.

The very easy fix is to set the instance to None on the field_value, as we don't need it.

I just wonder if I should use if hasattr(field_value, 'instance'): or isinstance(field_value, FieldFile)...

So, when somebody calls FieldInstanceTracker.get_field_value(), FieldFile.instance object is nulled. How will it work later without instance object? Getter should not have such side effects, instead you may construct new FieldFile object without setting instance attribute, or even use db value for field (i.e. filename).
Second variant introduces backward-incompatible changes, but whole this situation looks similar to the reason why ModelTracker had been deprecated.

Yes, my other idea was to subclass FileField to modify __getstate__(). I can try to do that.

For now I was trying to focus on getting a failing test for the size of the deepcopy objects.