julien-duponchelle/python-mysql-replication

Trailing zeros in fixed-length binary fields are dropped

oseemann opened this issue · 1 comments

Observed behaviour:
Values of columns with fixed-length binary type (e.g. binary(4)) with trailing zeros are not properly restored when decoding the binlog. E.g. the value \x01\x02\x00\x00 stored in the a binary(4) column is captured as \x01\x02.

Expected behaviour:
The row event should capture the same value as is returned in MySQL. E.g. in the above case the value should be \x01\x02\x00\x00.

Further information:
See https://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html

When BINARY values are stored, they are right-padded with the pad value to the specified length. The pad value is 0x00 (the zero byte). Values are right-padded with 0x00 for inserts, and no trailing bytes are removed for retrievals.

It appears that the 0-padding for fixed-length binary fields happens at retrieval time and not at write time. I.e. any trailing zeros are NOT written to disk (and the binlog), because they are padded at retrieval anyway.

The following test cases demonstrate the issue:

diff --git a/pymysqlreplication/tests/test_data_type.py b/pymysqlreplication/tests/test_data_type.py
index 0b3c81a..1b261ea 100644
--- a/pymysqlreplication/tests/test_data_type.py
+++ b/pymysqlreplication/tests/test_data_type.py
@@ -101,6 +101,18 @@ class TestDataType(base.PyMySQLReplicationTestCase):

         return event

+    def test_varbinary(self):
+        create_query = "CREATE TABLE test(b VARBINARY(4))"
+        insert_query = "INSERT INTO test VALUES(UNHEX('ff010000'))"
+        event = self.create_and_insert_value(create_query, insert_query)
+        self.assertEqual(event.rows[0]["values"]["b"], b'\xff\x01\x00\x00')
+
+    def test_fixed_width_binary(self):
+        create_query = "CREATE TABLE test(b BINARY(4))"
+        insert_query = "INSERT INTO test VALUES(UNHEX('ff010000'))"
+        event = self.create_and_insert_value(create_query, insert_query)
+        self.assertEqual(event.rows[0]["values"]["b"], b'\xff\x01\x00\x00')
+

Here the test_varbinary test suceeds, but the test_fixed_width_binary tests fails, with

>       self.assertEqual(event.rows[0]["values"]["b"], b'\xff\x01\x00\x00')
E       AssertionError: b'\xff\x01' != b'\xff\x01\x00\x00'

From the column_schema we know when a field is of fixed-length binary type and can do the zero-padding at retrieval time in the library as well. I'll be working on a fix.

I published a 0.41 release