julien-duponchelle/python-mysql-replication

Untrusted binlog row snapshot data >=v1.0

Closed this issue · 9 comments

Binlog triggers events well, but not all snapshot data is received.

same log position with version 0.43, 1.0.1 but snapshot is different contains only null value

in 0.43

{"before_values": {"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-08-24T17:16:53"}, "after_values": {"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-10-06T14:36:24"}}

in 1.0.1

{"before_values": {"null": "2023-08-24T17:16:53"}, "after_values": {"null": "2023-10-06T14:36:24"}}
mysql> show variables like '%binlog%';
+------------------------------------------------+----------------------+
| Variable_name                                  | Value                |
+------------------------------------------------+----------------------+
| aurora_binlog_reserved_event_bytes             | 1024                 |
| binlog_cache_size                              | 32768                |
| binlog_checksum                                | CRC32                |
| binlog_direct_non_transactional_updates        | OFF                  |
| binlog_encryption                              | OFF                  |
| binlog_error_action                            | ABORT_SERVER         |
| binlog_expire_logs_seconds                     | 0                    |
| binlog_format                                  | ROW                  |
| binlog_group_commit_sync_delay                 | 0                    |
| binlog_group_commit_sync_no_delay_count        | 0                    |
| binlog_gtid_simple_recovery                    | ON                   |
| binlog_max_flush_queue_time                    | 0                    |
| binlog_order_commits                           | ON                   |
| binlog_rotate_encryption_master_key_at_startup | OFF                  |
| binlog_row_event_max_size                      | 8192                 |
| binlog_row_image                               | FULL                 |
| binlog_row_metadata                            | MINIMAL              |
| binlog_row_value_options                       |                      |
| binlog_rows_query_log_events                   | OFF                  |
| binlog_stmt_cache_size                         | 32768                |
| binlog_transaction_compression                 | OFF                  |
| binlog_transaction_compression_level_zstd      | 3                    |
| binlog_transaction_dependency_history_size     | 25000                |
| binlog_transaction_dependency_tracking         | COMMIT_ORDER         |
| innodb_api_enable_binlog                       | OFF                  |
| log_statements_unsafe_for_binlog               | ON                   |
| max_binlog_cache_size                          | 18446744073709547520 |
| max_binlog_size                                | 134217728            |
| max_binlog_stmt_cache_size                     | 18446744073709547520 |
| sync_binlog                                    | 1                    |
+------------------------------------------------+----------------------+
30 rows in set (0.02 sec)

I think starting from 1.0, it seems that binlog needs to be changed from binlog_row_metadata=MINIMAL to FULL, but compatibility seems to be necessary even if it was previously MINIMAL.

We have not yet tested whether this situation occurs even when binlog_row_metadata=FULL, and this is an issue that occurred during only mysql-replication package deployment.

@jongwony

can u try 1.0.2 version and 0.45.1

1.0.1 is a yanked version

@sean-k1 I try both version 0.45.1, 1.0.2.
but 0.45.1

{"before_values": {"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-08-24T17:16:53"}, "after_values": {"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-10-06T14:36:24"}}

1.0.2

{"before_values": {"null": "2023-08-24T17:16:53"}, "after_values": {"null": "2023-10-06T14:36:24"}}

same problem occurred

@jongwony
Thanks for reporting Issue.

The Json parsing logic part is the exactly same 0.45.1 and 1.0.2.
So I think it's a different issue.

Can you let me know about your BinlogStreamReader setting

stream = BinLogStreamReader(connection_settings = mysql_settings, server_id=100)

for binlogevent in stream:
    # add your case

My case

1.0.2 version
스크린샷 2023-10-10 오후 1 38 39

sql query

CREATE TABLE test_json_v2 (
                            id INT primary KEY,
                            c JSON
                        );

INSERT INTO test_json_v2 VALUES (102, '{"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-08-24T17:16:53"}');

UPDATE test_json_v2 SET c = '{"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-10-06T14:36:24"}' WHERE id = 102;

dumps like this


=== QueryEvent ===
Date: 2023-10-10T04:14:49
Log position: 28079
Event size: 239
Read bytes: 239
Schema: b'pymysqlreplication_test'
Execution time: 0
Query: CREATE TABLE test_json_v2 (
                            id INT primary KEY,
                            c JSON
                        )

=== QueryEvent ===
Date: 2023-10-10T04:28:54
Log position: 28252
Event size: 71
Read bytes: 71
Schema: b'pymysqlreplication_test'
Execution time: 0
Query: BEGIN

=== TableMapEvent ===
Date: 2023-10-10T04:28:54
Log position: 28331
Event size: 56
Read bytes: 56
Table id: 3183
Schema: pymysqlreplication_test
Table: test_json_v2
Columns: 2

=== WriteRowsEvent ===
Date: 2023-10-10T04:28:54
Log position: 28551
Event size: 197
Read bytes: 12
Table: pymysqlreplication_test.test_json_v2
Affected columns: 2
Changed rows: 1
Column Name Information Flag: False
Values:
--
* None : {b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-08-24T17:16:53'}

=== XidEvent ===
Date: 2023-10-10T04:28:54
Log position: 28582
Event size: 8
Read bytes: 8
Transaction ID: 5998

=== QueryEvent ===
Date: 2023-10-10T04:28:54
Log position: 28764
Event size: 80
Read bytes: 80
Schema: b'pymysqlreplication_test'
Execution time: 0
Query: BEGIN

=== TableMapEvent ===
Date: 2023-10-10T04:28:54
Log position: 28843
Event size: 56
Read bytes: 56
Table id: 3183
Schema: pymysqlreplication_test
Table: test_json_v2
Columns: 2

=== UpdateRowsEvent ===
Date: 2023-10-10T04:28:54
Log position: 29249
Event size: 383
Read bytes: 13
Table: pymysqlreplication_test.test_json_v2
Affected columns: 2
Changed rows: 1
Column Name Information Flag: False
Values:
--
*None:{b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-08-24T17:16:53'}=>{b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-10-06T14:36:24'}

=== XidEvent ===
Date: 2023-10-10T04:28:54
Log position: 29280
Event size: 8
Read bytes: 8
Transaction ID: 5999

I think this case is normal case when we set metadata = "MINIMAL"

*None:{b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-08-24T17:16:53'}=>{b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-10-06T14:36:24'}

I don't understand {"before_values": {"null": "2023-08-24T17:16:53"}, "after_values": {"null": "2023-10-06T14:36:24"}}
this case

Can you let me know how can i reproduce this case?

Sorry for the confusion. This data is not an example of a JSON column, but a general case.

It seems that the problem about MINIMAL you mentioned is correct, and what I said is that the key of before_values ​​and after_values ​​should be c, not None.

In the previous version 0.45.1, it appears that columns were tracked even if metadata=MINIMAL.

If you re-run the test case you showed with a different version for the same binlog position, I expect the column to have a value other than None.

0.45.1 event.dump()

image

1.0.2 event.dump()

image

@jongwony
Now i understand this problem!

# row_event.py 
        for i in range(0, nb_columns):
            column = self.columns[i]
            name = self.table_map[self.table_id].columns[i].name
            unsigned = self.table_map[self.table_id].columns[i].unsigned

            values[name] = self.__read_values_name( <--- Problem 
                column,
                null_bitmap,
                null_bitmap_index,
                cols_bitmap,
                unsigned,
                i,
            )

we don't know column_name so it is null
python dict values[name] key null always overwrite value object!
so You just find only last order object Value about update_time

so If you change binlog_row_metadata Minimal => FULL
it will resolve the Issue.

binlog_row_metadata is not a valid variable for MySQL 5.7.

binlog_row_metadata is not a valid variable for MySQL 5.7.

@mindler-sasu

Yes so we support 0.45.1 version for mysql 5.7 version