abg/dbsake

uncaught exception with unknown encoding

Closed this issue · 3 comments

mysql> use test
mysql> create table dbsake1 (a int) default charset=utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table dbsake2 (a int) default charset=utf8mb4;
Query OK, 0 rows affected (0.01 sec)





[root@mg ~]# /root/dbsake --version
dbsake, version 2.1.0 9525896



[root@mg ~]# /root/dbsake frmdump /var/lib/mysql/test/dbsake1.frm
--
-- Table structure for table `dbsake1`
-- Created with MySQL Version 5.5.47
--

CREATE TABLE `dbsake1` (
  `a` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;




[root@mg ~]# /root/dbsake frmdump /var/lib/mysql/test/dbsake2.frm
Uncaught exception! (╯°□°)╯ ︵ ┻━┻
Traceback (most recent call last):
  File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "/root/dbsake/__main__.py", line 21, in <module>
    sys.exit(main())
  File "/root/dbsake/__main__.py", line 18, in main
    sys.exit(dbsake.cli.main())
  File "/root/dbsake/dbsake/cli/__init__.py", line 123, in main
    dbsake(args=argv, auto_envvar_prefix='DBSAKE', obj={})
  File "/root/dbsake/click/core.py", line 488, in __call__
    return self.main(*args, **kwargs)
  File "/root/dbsake/click/core.py", line 474, in main
    self.invoke(ctx)
  File "/root/dbsake/click/core.py", line 758, in invoke
    return self.invoke_subcommand(ctx, cmd, cmd_name, ctx.args[1:])
  File "/root/dbsake/click/core.py", line 767, in invoke_subcommand
    return cmd.invoke(cmd_ctx)
  File "/root/dbsake/click/core.py", line 659, in invoke
    ctx.invoke(self.callback, **ctx.params)
  File "/root/dbsake/click/core.py", line 325, in invoke
    return callback(*args, **kwargs)
  File "/root/dbsake/dbsake/cli/cmd/frm.py", line 37, in frmdump
    table = frm.parse(name)
  File "/root/dbsake/dbsake/core/mysql/frm/__init__.py", line 41, in parse
    return dispatch(path)
  File "/root/dbsake/dbsake/core/mysql/frm/binaryfrm.py", line 393, in parse
    table = Table.from_data(data, context=packed_frm_data)
  File "/root/dbsake/dbsake/core/mysql/frm/binaryfrm.py", line 138, in from_data
    connection = connection.decode(charset.name)
LookupError: unknown encoding: utf8mb4
It's okay. ┬─┬ノ( º_ ºノ)
Consider filing a bug report at https://github.com/abg/dbsake/issu
abg commented

Thanks for the report! I thought I covered this case, but apparently not. python should handle utf8mb4 just the same as utf8, I think, so this should be a trivial fix.

abg commented

This affects a few other interesting cases. The default value for text columns is encoded with the column character set (which may or may not be the same as the table character set). This is being handled correctly for the most part, but mapping between the mysql charset name <-> python charset name is not being done at all.

So utf8mb4 doesn't exist in python; This just needs to be mapped to python's 'utf-8'. MySQL 'utf16' needs to be mapped to python 'utf_16_be' (big endian) - I imagine the deprecated ucs2 mapping is identical here. Right now 'utf16' implicitly maps to 'utf_16_le' in python, which does the wrong decoding, for instance and various other unicode cases that implicitly have the same charset name between mysql / python are probably being handled incorrectly as well.

I think the easiest way here is to extend the Charset() object already created to wrap the details of a MySQL collation and add some sort of 'pycharset' attribute and implement a mapping from mysql <-> python. There may be charsets dbsake will not support and probably should handle that more gracefully during the various unpacking phases for both default values and various table level attributes.

There needs to be at least some test case to exercise the utf8mb4 case and the various utf16/32 cases that might be used in particular environments.

abg commented

More to the point of the initial report, table level attributes (identifiers, connection string, comments, etc.) are always effectively encoded as character_set_system, which should always be utf-8. So trying to decode with whatever the table character set name is discovered is very wrong here.

So two similar bugs here:

  • default values for string columns could fail when the python charset name didn't match the mysql charset name
  • table level attributes need to use utf-8 consistently, but dbsake was incorrectly using the table character set for some attributes