Autofield patching doesn't work for multi-database migration against MySQL
Closed this issue ยท 11 comments
I am trying to keep all of my existing tables in MySQL but add "spanner" as a second database for certain apps.
DATABASES
looks something like this:
{'default': {'ENGINE': 'django.db.backends.mysql',
'NAME': 'dbname',
'USER': 'username',
'PASSWORD': 'FOOBAR',
'HOST': '127.0.0.1',
'PORT': '3306',
'CONN_MAX_AGE': 240,
'ATOMIC_REQUESTS': False,
'AUTOCOMMIT': True,
'OPTIONS': {},
'TIME_ZONE': None,
'TEST': {'CHARSET': None, 'COLLATION': None, 'NAME': None, 'MIRROR': None}},
'spanner': {'ENGINE': 'django_spanner',
'PROJECT': 'my-gcp-project',
'INSTANCE': 'manual-test0',
'NAME': 'test1',
'ATOMIC_REQUESTS': False,
'AUTOCOMMIT': True,
'CONN_MAX_AGE': 0,
'OPTIONS': {},
'TIME_ZONE': None,
'USER': '',
'PASSWORD': '',
'HOST': '',
'PORT': '',
'TEST': {'CHARSET': None, 'COLLATION': None, 'NAME': None, 'MIRROR': None}}}
I have a router that is much like the one in the multi-db docs that puts just 1 table(app) in spanner.
class SpannerAppRouter:
"""
A router to control all database operations on models in the
myapp application to spanner
"""
spanner_app_labels = {'myapp'}
def db_for_read(self, model, **hints):
if model._meta.app_label in self.spanner_app_labels:
return 'spanner'
return None
def db_for_write(self, model, **hints):
if model._meta.app_label in self.spanner_app_labels:
return 'spanner'
return None
def allow_relation(self, obj1, obj2, **hints):
"""
Allow relations if all models involved are in spanner
"""
if all([
obj1._meta.app_label in self.spanner_app_labels,
obj2._meta.app_label in self.spanner_app_labels
]):
return True
return None
def allow_migrate(self, db, app_label, model_name=None, **hints):
"""
Make sure spanner tables only appear in the 'spanner' database
and spanner tables don't appear in the default
"""
# print(db, app_label, model_name)
if app_label in self.spanner_app_labels:
return db == 'spanner'
elif db == 'spanner':
return False
return None
This works fine and migration goes well for ./manage.py migrate --database=spanner
.
But, when I go to migrate the default database ./manage.py migrate
, I get django.db.utils.IntegrityError: (1062, "Duplicate entry '2147483647' for key 'PRIMARY'")
in the django_migrations
table.
I look into the migrations table in MySQL and I see that the first 250 rows are regular auto-increment PKs but the last one has the different AutoField style:
*************************** 249. row ***************************
id: 269
app: foo
name: 0080_xxx
applied: 2021-12-10 15:35:38.909703
*************************** 250. row ***************************
id: 270
app: foo
name: 0081_xxx
applied: 2021-12-10 15:35:39.134282
*************************** 251. row ***************************
id: 2147483647
app: mynewapponspanner
name: 0001_fooo
applied: 2021-12-15 19:48:38.193435
251 rows in set (0.004 sec)
If I delete this row and run migrate again, I again get the IntegrityError
(it's trying to run 2 migrations ultimately, so it seems the second migration is trying to use the same id for some reason). If I run migrate again without deleting the row, I get
File "./manage.py", line 8, in <module>
run_django_cli(sys.argv)
File "/workspaces/repose/py/dj/utils.py", line 23, in run_django_cli
execute_from_command_line(args)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/base.py", line 83, in wrapped
res = handle_func(*args, **kwargs)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/core/management/commands/migrate.py", line 234, in handle
fake_initial=fake_initial,
File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/executor.py", line 117, in migrate
state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/executor.py", line 245, in apply_migration
state = migration.apply(state, schema_editor)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/migration.py", line 114, in apply
operation.state_forwards(self.app_label, project_state)
File "/repose-cache/py/env/lib/python3.7/site-packages/django/db/migrations/operations/fields.py", line 229, in state_forwards
state.models[app_label, self.model_name_lower].fields
KeyError: ('myappname', 'mymodelname')
I thought I could get the migration to run by removing django_spanner from INSTALLED_APPS ... but then I notice that django_spanner
is imported in the migration itself:
fields=[
('id', models.AutoField(auto_created=True, default=django_spanner.gen_rand_int64, primary_key=True, serialize=False, verbose_name='ID')),
Thus, the monkey patching still happens, even though I am trying to essentially skip the migration on the default/MySQL database:
I worry about what other side-effects this monkey-patching may have with the existing apps I want to route to the MySQL default. For my use case, it would be preferable if the Autofield was not globally patched but only patched based on the ENGINE of the database. I'm not sure if this is possible; but, I do something similar with the connection in rel_db_type
which gets passed the connection
.
Pseudo-thought:
class SpannerAutoField(models.AutoField):
def some_method(self, connection):
if connection.settings_dict['ENGINE'] == 'django_spanner':
# do something?
# pass
return
return super().some_method(connection)
Would varying the Django AutoField based on the ENGINE
be feasible? I doubt all of my non-spanner apps will work with the monkey-patch. Playing in a test copy, I am not able to migrate the old database (even though I am skipping the migration through allow_migrate
hook of the router) - the one table I want to put in spanner migrates fine in spanner.
Another possible option in my case would be if I could turn the monkey-patching off and, for all tables in spanner, I could just define an explicit PK ...
I thought I might be onto something - I made a quick hack to see what would happen if I unmonkeypatched the AutoField
I then made an explicit id
like I think django_spanner would have, something like:
class SpannerAutoField(models.AutoField):
def __init__(self, *args, **kwargs):
kwargs["blank"] = True
models.Field.__init__(self, *args, **kwargs)
self.default = django_spanner.gen_rand_int64
SpannerAutoField.db_returning = False
SpannerAutoField.validators = []
id = SpannerAutoField(
auto_created=True,
default=django_spanner.gen_rand_int64,
primary_key=True,
serialize=False,
verbose_name='ID',
)
This sort of worked to migrate the MySQL default; but, then, predictably for someone who has experience here, I got the following when trying to migrate --database=spanner
:
django.db.utils.IntegrityError: 400 id must not be NULL in table django_migrations.
Monkey-patching the AutoField when django_spanner
is imported seems too aggressive to me. Again, I think that patching in the different behavior to AutoField somehow based on the connection ENGINE would be the most ideal, maybe with settings to dictate whether you want everything patched, nothing patched, patch based on engine maybe ...
again I thought I might be onto something - maybe I would allow the monkeypatch. Maybe the way django_migrations instantiates their model is what is causing the duplicate ID - even with the monkey patching, one would hope that each migration row would insert a unique key. So, something must be "sticking" to make it choose the same key twice. I think there is something to this idea because the django.db.migrations code puts creation class behind something of a cache attr:
https://github.com/django/django/blob/2.2.24/django/db/migrations/recorder.py#L30-L45
But, even if I run once and get a row with the ID (errors on second migration for the app), if I run again I get the KeyError
in this funny line of code
state.models[app_label, self.model_name_lower].fields = [
(n, field if n == self.name else f)
for n, f in
state.models[app_label, self.model_name_lower].fields
]
-->
File "/py/env/lib/python3.7/site-packages/django/db/migrations/operations/fields.py", line 229, in state_forwards
state.models[app_label, self.model_name_lower].fields
KeyError: ('myapp', 'mymodel')
The specific problem is I want to get the migration on the default database without it actually applying anything. But, I fear the problem might also show in other apps when I'm actually trying to make a migration (not sure).
One funny thing I notice is that the ID it's trying to create for the django_migrations table is always '2147483647'. So, I wonder why it's sticking in this case - run after run it creates the same ID every time. Maybe there is something I can kick to force it to generate a new one?
Thanks @skyl for brining this up, I will try and investigate a solution for this. This would need sometime as the issue spans across 2 dbs working together, I will get back once I have some update.
Any update on this guys? It doesn't leave too many options for the users if an import of django_spanner monkey-patches the AutoField. We're having to resort to keeping a forked copy - it becomes a challenge during upgrades/patches, etc.
Why not make django spanner based models to explicitly define an autofield that's also available as a built-in - or implicitly use this custom autofield for spanner based models without the user ever knowing?
cc @asthamohta
@asthamohta, the original code author left us long ago, so we're on our own here.
Long story short: every table in Spanner requires a primary key (in a table without a PK there can be only one row). Still, Spanner doesn't provide an autoincrement mechanism for integer ids (the most common type of ids). Django, from its side, adds an autoincremented id field into every new table (and, as Spanner doesn't support autoincrement, it doesn't work).
What the original code author tried to do was to generate a key for every new row and do it under the hood. First it was done for new rows, next for the whole AutoField type. As users noticed, it's a pretty rough solution, which create problems and misunderstandings in a bunch of cases.
I think the best way to solve the problem is to let users to control ids. We did this in SQLAlchemy and so far didn't get any troubles:
https://github.com/googleapis/python-spanner-sqlalchemy#autoincremented-ids
At this point Spanner and Django don't fit each other, so I suppose it's better to let users deal with ids manually instead of creating unexpected under-the-hood fads (which are also creating performance problems).
P.S. we can probably think of something like a prepared class with a basic UUID generation function. So that users could import it and add into their table, not writing it by themselves. Need to investigate if it can be done with a single class for both SQLAlchemy and Django APIs.
@IlyaFaer : I think the best way to solve the problem is to let users to control ids. We did this in SQLAlchemy and so far didn't get any troubles: [link](https://github.com/googleapis/python-spanner-sqlalchemy#autoincremented-ids)
Would this be a breaking change, given django-spanner is already GA? If so, we will need to make the change backward compatible, so that users who have built applications for a single database (Spanner) are not inconvenienced.
Would varying the Django AutoField based on the ENGINE be feasible?
For now, I think, it's the only possible solution (in case we don't want to inflict breaking changes). Looks like we can use the global configurations object inside the AutoField
class to use the generation function only for Spanner. 'Cause, yes, using a generation function for all the databases under the hood can be pretty confusing. Will try the solution at the same case and will push, if it's working fine.
@skyl, does this fix work for you?
https://github.com/googleapis/python-spanner-django/pull/771/files
We've got an error which should be fixed on Django side, so it may take some time before we'll be able to test the fix completely. But it's there, maybe it's already enough to fix your problem locally?
@skyl, does this fix work for you? https://github.com/googleapis/python-spanner-django/pull/771/files
We've got an error which should be fixed on Django side, so it may take some time before we'll be able to test the fix completely. But it's there, maybe it's already enough to fix your problem locally?
I'll need to dust off the code that was trying to use spanner but it looks good to me; I'll see if I can try it in the next couple of weeks.