uncaught exception with unknown encoding
Closed this issue · 3 comments
mysql> use test
mysql> create table dbsake1 (a int) default charset=utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> create table dbsake2 (a int) default charset=utf8mb4;
Query OK, 0 rows affected (0.01 sec)
[root@mg ~]# /root/dbsake --version
dbsake, version 2.1.0 9525896
[root@mg ~]# /root/dbsake frmdump /var/lib/mysql/test/dbsake1.frm
--
-- Table structure for table `dbsake1`
-- Created with MySQL Version 5.5.47
--
CREATE TABLE `dbsake1` (
`a` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
[root@mg ~]# /root/dbsake frmdump /var/lib/mysql/test/dbsake2.frm
Uncaught exception! (╯°□°)╯ ︵ ┻━┻
Traceback (most recent call last):
File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
exec code in run_globals
File "/root/dbsake/__main__.py", line 21, in <module>
sys.exit(main())
File "/root/dbsake/__main__.py", line 18, in main
sys.exit(dbsake.cli.main())
File "/root/dbsake/dbsake/cli/__init__.py", line 123, in main
dbsake(args=argv, auto_envvar_prefix='DBSAKE', obj={})
File "/root/dbsake/click/core.py", line 488, in __call__
return self.main(*args, **kwargs)
File "/root/dbsake/click/core.py", line 474, in main
self.invoke(ctx)
File "/root/dbsake/click/core.py", line 758, in invoke
return self.invoke_subcommand(ctx, cmd, cmd_name, ctx.args[1:])
File "/root/dbsake/click/core.py", line 767, in invoke_subcommand
return cmd.invoke(cmd_ctx)
File "/root/dbsake/click/core.py", line 659, in invoke
ctx.invoke(self.callback, **ctx.params)
File "/root/dbsake/click/core.py", line 325, in invoke
return callback(*args, **kwargs)
File "/root/dbsake/dbsake/cli/cmd/frm.py", line 37, in frmdump
table = frm.parse(name)
File "/root/dbsake/dbsake/core/mysql/frm/__init__.py", line 41, in parse
return dispatch(path)
File "/root/dbsake/dbsake/core/mysql/frm/binaryfrm.py", line 393, in parse
table = Table.from_data(data, context=packed_frm_data)
File "/root/dbsake/dbsake/core/mysql/frm/binaryfrm.py", line 138, in from_data
connection = connection.decode(charset.name)
LookupError: unknown encoding: utf8mb4
It's okay. ┬─┬ノ( º_ ºノ)
Consider filing a bug report at https://github.com/abg/dbsake/issu
Thanks for the report! I thought I covered this case, but apparently not. python should handle utf8mb4 just the same as utf8, I think, so this should be a trivial fix.
This affects a few other interesting cases. The default value for text columns is encoded with the column character set (which may or may not be the same as the table character set). This is being handled correctly for the most part, but mapping between the mysql charset name <-> python charset name is not being done at all.
So utf8mb4 doesn't exist in python; This just needs to be mapped to python's 'utf-8'. MySQL 'utf16' needs to be mapped to python 'utf_16_be' (big endian) - I imagine the deprecated ucs2 mapping is identical here. Right now 'utf16' implicitly maps to 'utf_16_le' in python, which does the wrong decoding, for instance and various other unicode cases that implicitly have the same charset name between mysql / python are probably being handled incorrectly as well.
I think the easiest way here is to extend the Charset() object already created to wrap the details of a MySQL collation and add some sort of 'pycharset' attribute and implement a mapping from mysql <-> python. There may be charsets dbsake will not support and probably should handle that more gracefully during the various unpacking phases for both default values and various table level attributes.
There needs to be at least some test case to exercise the utf8mb4 case and the various utf16/32 cases that might be used in particular environments.
More to the point of the initial report, table level attributes (identifiers, connection string, comments, etc.) are always effectively encoded as character_set_system, which should always be utf-8. So trying to decode with whatever the table character set name is discovered is very wrong here.
So two similar bugs here:
- default values for string columns could fail when the python charset name didn't match the mysql charset name
- table level attributes need to use utf-8 consistently, but dbsake was incorrectly using the table character set for some attributes