googleapis/python-spanner-django

Autofield patching doesn't work for multi-database migration against MySQL

Closed this issue ยท 11 comments

skyl commented

I am trying to keep all of my existing tables in MySQL but add "spanner" as a second database for certain apps.

DATABASES looks something like this:

{'default': {'ENGINE': 'django.db.backends.mysql',
  'NAME': 'dbname',
  'USER': 'username',
  'PASSWORD': 'FOOBAR',
  'HOST': '127.0.0.1',
  'PORT': '3306',
  'CONN_MAX_AGE': 240,
  'ATOMIC_REQUESTS': False,
  'AUTOCOMMIT': True,
  'OPTIONS': {},
  'TIME_ZONE': None,
  'TEST': {'CHARSET': None, 'COLLATION': None, 'NAME': None, 'MIRROR': None}},

 'spanner': {'ENGINE': 'django_spanner',
  'PROJECT': 'my-gcp-project',
  'INSTANCE': 'manual-test0',
  'NAME': 'test1',
  'ATOMIC_REQUESTS': False,
  'AUTOCOMMIT': True,
  'CONN_MAX_AGE': 0,
  'OPTIONS': {},
  'TIME_ZONE': None,
  'USER': '',
  'PASSWORD': '',
  'HOST': '',
  'PORT': '',
  'TEST': {'CHARSET': None, 'COLLATION': None, 'NAME': None, 'MIRROR': None}}}

I have a router that is much like the one in the multi-db docs that puts just 1 table(app) in spanner.

class SpannerAppRouter:
    """
    A router to control all database operations on models in the
    myapp application to spanner
    """
    spanner_app_labels = {'myapp'}

    def db_for_read(self, model, **hints):

        if model._meta.app_label in self.spanner_app_labels:
            return 'spanner'
        return None

    def db_for_write(self, model, **hints):
        if model._meta.app_label in self.spanner_app_labels:
            return 'spanner'
        return None

    def allow_relation(self, obj1, obj2, **hints):
        """
        Allow relations if all models involved are in spanner
        """
        if all([
            obj1._meta.app_label in self.spanner_app_labels,
            obj2._meta.app_label in self.spanner_app_labels
        ]):
            return True
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        """
        Make sure spanner tables only appear in the 'spanner' database
        and spanner tables don't appear in the default
        """
        # print(db, app_label, model_name)
        if app_label in self.spanner_app_labels:
            return db == 'spanner'
        elif db == 'spanner':
            return False
        return None

This works fine and migration goes well for ./manage.py migrate --database=spanner.

But, when I go to migrate the default database ./manage.py migrate, I get django.db.utils.IntegrityError: (1062, "Duplicate entry '2147483647' for key 'PRIMARY'") in the django_migrations table.

I look into the migrations table in MySQL and I see that the first 250 rows are regular auto-increment PKs but the last one has the different AutoField style:

*************************** 249. row ***************************
     id: 269
    app: foo
   name: 0080_xxx
applied: 2021-12-10 15:35:38.909703
*************************** 250. row ***************************
     id: 270
    app: foo
   name: 0081_xxx
applied: 2021-12-10 15:35:39.134282
*************************** 251. row ***************************
     id: 2147483647
    app: mynewapponspanner
   name: 0001_fooo
applied: 2021-12-15 19:48:38.193435
251 rows in set (0.004 sec)

If I delete this row and run migrate again, I again get the IntegrityError (it's trying to run 2 migrations ultimately, so it seems the second migration is trying to use the same id for some reason). If I run migrate again without deleting the row, I get

File "./manage.py", line 8, in <module>
    run_django_cli(sys.argv)
  File "/workspaces/repose/py/dj/utils.py", line 23, in run_django_cli
    execute_from_command_line(args)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/base.py", line 83, in wrapped
    res = handle_func(*args, **kwargs)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/commands/migrate.py", line 234, in handle
    fake_initial=fake_initial,
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/executor.py", line 117, in migrate
    state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
    state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/executor.py", line 245, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/migration.py", line 114, in apply
    operation.state_forwards(self.app_label, project_state)
  File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/operations/fields.py", line 229, in state_forwards
    state.models[app_label, self.model_name_lower].fields
KeyError: ('myappname', 'mymodelname')

I thought I could get the migration to run by removing django_spanner from INSTALLED_APPS ... but then I notice that django_spanner is imported in the migration itself:

            fields=[
                ('id', models.AutoField(auto_created=True, default=django_spanner.gen_rand_int64, primary_key=True, serialize=False, verbose_name='ID')),

Thus, the monkey patching still happens, even though I am trying to essentially skip the migration on the default/MySQL database:

# Monkey-patch AutoField to generate a random value since Cloud Spanner can't

I worry about what other side-effects this monkey-patching may have with the existing apps I want to route to the MySQL default. For my use case, it would be preferable if the Autofield was not globally patched but only patched based on the ENGINE of the database. I'm not sure if this is possible; but, I do something similar with the connection in rel_db_type which gets passed the connection.

Pseudo-thought:

class SpannerAutoField(models.AutoField):

    def some_method(self, connection):
        if connection.settings_dict['ENGINE'] == 'django_spanner':
            # do something?
            # pass
            return
        return super().some_method(connection)

Would varying the Django AutoField based on the ENGINE be feasible? I doubt all of my non-spanner apps will work with the monkey-patch. Playing in a test copy, I am not able to migrate the old database (even though I am skipping the migration through allow_migrate hook of the router) - the one table I want to put in spanner migrates fine in spanner.

skyl commented

Another possible option in my case would be if I could turn the monkey-patching off and, for all tables in spanner, I could just define an explicit PK ...

skyl commented

I thought I might be onto something - I made a quick hack to see what would happen if I unmonkeypatched the AutoField

skyl@a835787

I then made an explicit id like I think django_spanner would have, something like:

class SpannerAutoField(models.AutoField):

    def __init__(self, *args, **kwargs):
        kwargs["blank"] = True
        models.Field.__init__(self, *args, **kwargs)
        self.default = django_spanner.gen_rand_int64


SpannerAutoField.db_returning = False
SpannerAutoField.validators = []
    id = SpannerAutoField(
        auto_created=True,
        default=django_spanner.gen_rand_int64,
        primary_key=True,
        serialize=False,
        verbose_name='ID',
    )

This sort of worked to migrate the MySQL default; but, then, predictably for someone who has experience here, I got the following when trying to migrate --database=spanner:

django.db.utils.IntegrityError: 400 id must not be NULL in table django_migrations.

Monkey-patching the AutoField when django_spanner is imported seems too aggressive to me. Again, I think that patching in the different behavior to AutoField somehow based on the connection ENGINE would be the most ideal, maybe with settings to dictate whether you want everything patched, nothing patched, patch based on engine maybe ...

skyl commented

again I thought I might be onto something - maybe I would allow the monkeypatch. Maybe the way django_migrations instantiates their model is what is causing the duplicate ID - even with the monkey patching, one would hope that each migration row would insert a unique key. So, something must be "sticking" to make it choose the same key twice. I think there is something to this idea because the django.db.migrations code puts creation class behind something of a cache attr:

https://github.com/django/django/blob/2.2.24/django/db/migrations/recorder.py#L30-L45

But, even if I run once and get a row with the ID (errors on second migration for the app), if I run again I get the KeyError in this funny line of code

        state.models[app_label, self.model_name_lower].fields = [
            (n, field if n == self.name else f)
            for n, f in
            state.models[app_label, self.model_name_lower].fields
        ]

-->

File "/py/env/lib/python3.7/site-packages/django/db/migrations/operations/fields.py", line 229, in state_forwards
    state.models[app_label, self.model_name_lower].fields
KeyError: ('myapp', 'mymodel')

The specific problem is I want to get the migration on the default database without it actually applying anything. But, I fear the problem might also show in other apps when I'm actually trying to make a migration (not sure).

One funny thing I notice is that the ID it's trying to create for the django_migrations table is always '2147483647'. So, I wonder why it's sticking in this case - run after run it creates the same ID every time. Maybe there is something I can kick to force it to generate a new one?

Thanks @skyl for brining this up, I will try and investigate a solution for this. This would need sometime as the issue spans across 2 dbs working together, I will get back once I have some update.

Any update on this guys? It doesn't leave too many options for the users if an import of django_spanner monkey-patches the AutoField. We're having to resort to keeping a forked copy - it becomes a challenge during upgrades/patches, etc.
Why not make django spanner based models to explicitly define an autofield that's also available as a built-in - or implicitly use this custom autofield for spanner based models without the user ever knowing?

@asthamohta, the original code author left us long ago, so we're on our own here.

Long story short: every table in Spanner requires a primary key (in a table without a PK there can be only one row). Still, Spanner doesn't provide an autoincrement mechanism for integer ids (the most common type of ids). Django, from its side, adds an autoincremented id field into every new table (and, as Spanner doesn't support autoincrement, it doesn't work).

What the original code author tried to do was to generate a key for every new row and do it under the hood. First it was done for new rows, next for the whole AutoField type. As users noticed, it's a pretty rough solution, which create problems and misunderstandings in a bunch of cases.

I think the best way to solve the problem is to let users to control ids. We did this in SQLAlchemy and so far didn't get any troubles:
https://github.com/googleapis/python-spanner-sqlalchemy#autoincremented-ids

At this point Spanner and Django don't fit each other, so I suppose it's better to let users deal with ids manually instead of creating unexpected under-the-hood fads (which are also creating performance problems).

P.S. we can probably think of something like a prepared class with a basic UUID generation function. So that users could import it and add into their table, not writing it by themselves. Need to investigate if it can be done with a single class for both SQLAlchemy and Django APIs.

@IlyaFaer : I think the best way to solve the problem is to let users to control ids. We did this in SQLAlchemy and so far didn't get any troubles: [link](https://github.com/googleapis/python-spanner-sqlalchemy#autoincremented-ids)

Would this be a breaking change, given django-spanner is already GA? If so, we will need to make the change backward compatible, so that users who have built applications for a single database (Spanner) are not inconvenienced.

Would varying the Django AutoField based on the ENGINE be feasible?

For now, I think, it's the only possible solution (in case we don't want to inflict breaking changes). Looks like we can use the global configurations object inside the AutoField class to use the generation function only for Spanner. 'Cause, yes, using a generation function for all the databases under the hood can be pretty confusing. Will try the solution at the same case and will push, if it's working fine.

@skyl, does this fix work for you?
https://github.com/googleapis/python-spanner-django/pull/771/files

We've got an error which should be fixed on Django side, so it may take some time before we'll be able to test the fix completely. But it's there, maybe it's already enough to fix your problem locally?

skyl commented

@skyl, does this fix work for you? https://github.com/googleapis/python-spanner-django/pull/771/files

We've got an error which should be fixed on Django side, so it may take some time before we'll be able to test the fix completely. But it's there, maybe it's already enough to fix your problem locally?

I'll need to dust off the code that was trying to use spanner but it looks good to me; I'll see if I can try it in the next couple of weeks.