Newer versions of MySQL driver dropped support for gevent
Opened this issue · 6 comments
The mysqlclient
driver that we're using has removed gevent support in version 1.4.
The last version of the driver that supports it (1.3.4) supports Python up to 3.9 so this is blocking us from supporting newer Python versions and missing out on any potential improvements.
@squeaky-pl could we just switch to using the pure-Python driver PyMySQL ? It's by the same maintainer as mysqlclient
and he recommends using PyMySQL instead of mysqlclient if you use gevent.
This is one of possible solutions.
We would need to measure performance impact. sync-engine's sync pods have a pretty tight loop where it polls all the IMAP uids in the database for a given account against the remote IMAP uids. For large email inboxes it can be easily hundreds of thousands of uids fetched and decoded (this time in Python instead of C) in one SQL query.
We still have semi-broken data with strings that are not valid in Python 3 that were inserted into the database with Python 2 with surrogates, MySQL doesn't validate strings against invalid surrogate pairs and Python 3 does. we are currently correcting this on the fly.
We would need to see if this approach can be replicated on the new driver, or resync all the data that was synced with Python 2 and reinsert it with surrogate pairs corrected (fun project apart).
There are also some fun and weirds hacks here that access internal mysql C driver modules. See
sync-engine/inbox/util/concurrency.py
Lines 6 to 87 in c064bf8
If we find ourselves under time pressure we could also move to a distro with Python 3.9. Sync-engine works on Python 3.9 unmodified and it's even in the CI now. That would buy as one more year if using non LTS distro is an option. Ubuntu 21.04 shipped with 3.9.
@tsx was also talking to me about getting rid of Gevent first, which might be a smaller project and would align with the direction we are taking with other projects.
Also putting this on @wojcikstefan radar.
We still have semi-broken data with strings that are not valid in Python 3 that were inserted into the database with Python 2 with surrogates, MySQL doesn't validate strings against invalid surrogate pairs and Python 3 does. we are currently correcting this on the fly.
Sounds like this can be fixed with a standalone migration? We could come up with a clever enough select ... where field like '%surrogate%' and date_created < $python3_migration
and then fix and rewrite those rows directly.
resync all the data that was synced with Python 2 and reinsert it with surrogate pairs corrected
I know it'll be more complex than a couple queries, but could be way easier than full resyncs.