maykinmedia/django-regex-redirects

Specified key was too long; max key length is 3072 bytes

Closed this issue · 10 comments

When applying migrations I get:

Running migrations:
  Applying regex_redirects.0002_auto_20151217_1938...Traceback (most recent call last):

Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
(...)
django.db.utils.OperationalError: (1071, 'Specified key was too long; max key length is 3072 bytes')

I am using Django 2.2.15, MySQL 8.0.21. I found this, and have verified that the MySQL server uses character set UTF8 with default collate utf8_general_ci. MySQL is also using InnoDB as storage engine.

I have just ran into this issue as I am testing upgrade from Debian Stretch to Debian Buster, using distro-provided MariaDB. Additional item that changed for me was switching from mysql-python to mysqlclient package for database driver.

Just to emphasize, I did not have this problem under Debian Stretch (while using the mysql-python package). So it could be either that something changed in behaviour of MariaDB database server, or within how mysql-python handles the indexes (more on that below).

What cases the failure is the following part of migration:

        migrations.AlterField(
            model_name='redirect',
            name='old_path',
            field=models.CharField(help_text="This should be an absolute path, excluding the domain name. Example: '/events/search/'.", max_length=2000, verbose_name='redirect from', db_index=True),
        ),

In particular, the issue comes from enabling the index on the old_path column (with db_index=True) and its length of 2000. The issue is somewhat similar to issue #12 - essentially MariaDB/MySQL can't create indexes for very large char fields. Well, whatever the proper terminology is - I have not dug into too many details :)

It should be noted that this 3072 byte limit is hit depending on the character set used by the table. For example, I think that if using latin1 everything will work fine (because of 1 character being equivalent to 1 byte). The problem is really when you start using encodings where single character can be represented with multiple bytes (like utf8 or utf8mb4). In that case you take the maximum length of the field, and multiply it by maximum length of a single character for that encoding (in bytes).

When working with raw SQL queries, it is possible to tell MariaDB to essentially truncate the column for indexing purposes (taking into account only first n characters) - however I am unaware if this is possible with Django model API. I have at least seen someone open a ticket for on Django issue tracker - Add support for Index Length for MySQL-backed Models.

Having a quick look at it, I can see two (simple) solutions for django-regex-redirects:

  • Lower the length of old_path field to something more manageable by MariaDB/MySQL. In this case it would be necessary to also kinda make a documented restriction on what would be the "largest" allowed character encoding for use with the application. This could also cause problems for people who might actually use such long paths.
  • Drop the indexing over old_path. This would of course create performance hits.

I'm guessing some other solution might be available too, but I did not dig any deeper. I also kinda wonder if it would be possible to apply the migration selectively depending on database type - e.g. having a special use-case for MySQL/MariaDB. Perhaps that would be the way to go :)

Ugh MySQL/Mariadb and multi-byte characters, it's the gift that just keeps on giving :)

Feel free to do a PR to lower the length of old_path to something that works on Mariadb and I'll update the app in PyPI.

Based on a couple of production sites we have running 2000 seems excessive so lowering it to ~500 wouldn't be a problem for me, and for those cases where it would be I guess making use of a regex to grab the rest of the path shouldn't be a problem. Seems like a better option than dropping the index as we have a number of high-volume sites which make use of this django app.

Yeah, the initial 768-byte thing that was a problem some time ago is one of the reasons why I started understanding why people keep saying MySQL/MariaDB is "not a proper database" :)

Regarding the lengths, currently I think the maximum length that would work for MySQL/MariaDB would be 768 (768*4=3072 bytes for utf8mb4 encoding), so let's say to set it to 512? Or "max it out" at 768? The 3072-byte limit is specific to InnoDB I think (in case of MyISAM or something else it might be lower).

Is it sufficient to update the old_path only, or should I make the new_path shorter as well?

Looking through the migrations, I see that you/we already dropped the unique-index ~3.5 years ago in migration 4 / 0004_auto_20170512_1349.py:

https://github.com/maykinmedia/django-regex-redirects/pull/13/files

The problem however is that the unique-together index is created in migration 1, old_path increased in length to 2000 in migration 2, and the index is removed in migration 4. So your fix back then only works on Mariadb as long as migration 2 is applied after migration 4 or skipped.

So I suspect we don't have to shorten the paths after all. Either squashing the migrations or renaming 2 to 5 (and switching the dependancies around) would ensure that this works as expected on new installs with Mariadb.

Hm... Are you sure about that? I do not see any mention of unique_together in migration 1 as of this time? That specific PR was related to unique_together over ('old_path', 'new_path', 'fallback_redirect') which combined definitively exceeded the 768-byte limit back in the day.

For whatever reason the index over old_path did not cause issues for me on Debian Stretch in combination with its stock MariaDB server and MySQL-python library (installed through pip). I'm suspecting that either older release of MariaDB or MySQL-python library itself might have been more tolerant or performing some kind of tweak on-the-fly.

Unfortunately, can't build the MySQL-python library any more on Debian Buster, otherwise I'd try testing that out. Although, I could perhaps try out the mysqlclient that I'm switching to right now under Debian Stretch.

If I try to manually create an index over old_path in MariaDB, this is what I get (zinnia is blogging app I'm using the regex_redirects with):

MariaDB [zinnia]> create index old_path_index on regex_redirects_redirect(old_path);
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes

Sorry I was mistaken you are correct, it's been a while. old_path also has it's own index of course and that hasn't been changed.

Feel free to stick to the original idea and limit the size of old_path to 512 characters. new_path doesn't have an index so that can be left in place imho.

Ok, I am back to actually create a fix for this one... (because rolling-out custom patches locally is both dumb and long-term waste of time :)...

The previous fix I made related to MariaDB limitations was kinda easy to handle because the problematic migration was applied at the very end, and there were no dependencies against it - so I just essentially renamed the migration and bumped its sequence number.

Adding the migration that reduces the max length from 2000 to 512 is an easy step, however this does not help with migrations on MariaDB since the error happens while applying the 0002_auto_20151217_1938.py migration.

The most obvious thing for me is to modify that existing migration to simply use the same length of 512, but... I have a feeling that is not the right way to do it, and that it might backfire perhaps (not sure how sensitive Django is when making such changes to already applied migrations).

Should I instead create a new migration, copy over the content from the old migration, add the new one as dependency for previous migrations (where relevant) instead?

Hm... Ok, I guess the "cleanest" way would be to make a new empty migration, move everything from 0002_auto_20151217_1938.py into it, and update the existing 0004_auto_20170512_1349.py migration to depend on the new migration (now starting with sequence number 0005)?

I'd say just modify the 0002 migration to set it to a length of 512, limit the length of old_path to 512, add a 0005 migration for this model change, and if you want squash the migrations so new installs skip the intermediate migrations. I'd prefer not to mess around too much with the existing migrations and the 2000->512 change seems benign.

Fixed & new version 0.3.0 released