Untrusted binlog row snapshot data >=v1.0
Closed this issue · 9 comments
Binlog triggers events well, but not all snapshot data is received.
same log position with version 0.43, 1.0.1 but snapshot is different contains only null value
in 0.43
{"before_values": {"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-08-24T17:16:53"}, "after_values": {"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-10-06T14:36:24"}}
in 1.0.1
{"before_values": {"null": "2023-08-24T17:16:53"}, "after_values": {"null": "2023-10-06T14:36:24"}}
mysql> show variables like '%binlog%';
+------------------------------------------------+----------------------+
| Variable_name | Value |
+------------------------------------------------+----------------------+
| aurora_binlog_reserved_event_bytes | 1024 |
| binlog_cache_size | 32768 |
| binlog_checksum | CRC32 |
| binlog_direct_non_transactional_updates | OFF |
| binlog_encryption | OFF |
| binlog_error_action | ABORT_SERVER |
| binlog_expire_logs_seconds | 0 |
| binlog_format | ROW |
| binlog_group_commit_sync_delay | 0 |
| binlog_group_commit_sync_no_delay_count | 0 |
| binlog_gtid_simple_recovery | ON |
| binlog_max_flush_queue_time | 0 |
| binlog_order_commits | ON |
| binlog_rotate_encryption_master_key_at_startup | OFF |
| binlog_row_event_max_size | 8192 |
| binlog_row_image | FULL |
| binlog_row_metadata | MINIMAL |
| binlog_row_value_options | |
| binlog_rows_query_log_events | OFF |
| binlog_stmt_cache_size | 32768 |
| binlog_transaction_compression | OFF |
| binlog_transaction_compression_level_zstd | 3 |
| binlog_transaction_dependency_history_size | 25000 |
| binlog_transaction_dependency_tracking | COMMIT_ORDER |
| innodb_api_enable_binlog | OFF |
| log_statements_unsafe_for_binlog | ON |
| max_binlog_cache_size | 18446744073709547520 |
| max_binlog_size | 134217728 |
| max_binlog_stmt_cache_size | 18446744073709547520 |
| sync_binlog | 1 |
+------------------------------------------------+----------------------+
30 rows in set (0.02 sec)
I think starting from 1.0, it seems that binlog needs to be changed from binlog_row_metadata=MINIMAL to FULL, but compatibility seems to be necessary even if it was previously MINIMAL.
We have not yet tested whether this situation occurs even when binlog_row_metadata=FULL, and this is an issue that occurred during only mysql-replication package deployment.
@sean-k1 I try both version 0.45.1, 1.0.2.
but 0.45.1
{"before_values": {"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-08-24T17:16:53"}, "after_values": {"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-10-06T14:36:24"}}
1.0.2
{"before_values": {"null": "2023-08-24T17:16:53"}, "after_values": {"null": "2023-10-06T14:36:24"}}
same problem occurred
@jongwony
Thanks for reporting Issue.
The Json parsing logic part is the exactly same 0.45.1 and 1.0.2.
So I think it's a different issue.
Can you let me know about your BinlogStreamReader setting
stream = BinLogStreamReader(connection_settings = mysql_settings, server_id=100)
for binlogevent in stream:
# add your case
My case
sql query
CREATE TABLE test_json_v2 (
id INT primary KEY,
c JSON
);
INSERT INTO test_json_v2 VALUES (102, '{"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-08-24T17:16:53"}');
UPDATE test_json_v2 SET c = '{"id": 2877297, "user_id": 669862, "send_type": "push", "category": "marketing", "is_approved": 0, "create_time": "2021-12-09T10:20:24", "update_time": "2023-10-06T14:36:24"}' WHERE id = 102;
dumps like this
=== QueryEvent ===
Date: 2023-10-10T04:14:49
Log position: 28079
Event size: 239
Read bytes: 239
Schema: b'pymysqlreplication_test'
Execution time: 0
Query: CREATE TABLE test_json_v2 (
id INT primary KEY,
c JSON
)
=== QueryEvent ===
Date: 2023-10-10T04:28:54
Log position: 28252
Event size: 71
Read bytes: 71
Schema: b'pymysqlreplication_test'
Execution time: 0
Query: BEGIN
=== TableMapEvent ===
Date: 2023-10-10T04:28:54
Log position: 28331
Event size: 56
Read bytes: 56
Table id: 3183
Schema: pymysqlreplication_test
Table: test_json_v2
Columns: 2
=== WriteRowsEvent ===
Date: 2023-10-10T04:28:54
Log position: 28551
Event size: 197
Read bytes: 12
Table: pymysqlreplication_test.test_json_v2
Affected columns: 2
Changed rows: 1
Column Name Information Flag: False
Values:
--
* None : {b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-08-24T17:16:53'}
=== XidEvent ===
Date: 2023-10-10T04:28:54
Log position: 28582
Event size: 8
Read bytes: 8
Transaction ID: 5998
=== QueryEvent ===
Date: 2023-10-10T04:28:54
Log position: 28764
Event size: 80
Read bytes: 80
Schema: b'pymysqlreplication_test'
Execution time: 0
Query: BEGIN
=== TableMapEvent ===
Date: 2023-10-10T04:28:54
Log position: 28843
Event size: 56
Read bytes: 56
Table id: 3183
Schema: pymysqlreplication_test
Table: test_json_v2
Columns: 2
=== UpdateRowsEvent ===
Date: 2023-10-10T04:28:54
Log position: 29249
Event size: 383
Read bytes: 13
Table: pymysqlreplication_test.test_json_v2
Affected columns: 2
Changed rows: 1
Column Name Information Flag: False
Values:
--
*None:{b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-08-24T17:16:53'}=>{b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-10-06T14:36:24'}
=== XidEvent ===
Date: 2023-10-10T04:28:54
Log position: 29280
Event size: 8
Read bytes: 8
Transaction ID: 5999
I think this case is normal case when we set metadata = "MINIMAL"
*None:{b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-08-24T17:16:53'}=>{b'id': 2877297, b'user_id': 669862, b'category': b'marketing', b'send_type': b'push', b'create_time': b'2021-12-09T10:20:24', b'is_approved': 0, b'update_time': b'2023-10-06T14:36:24'}
I don't understand {"before_values": {"null": "2023-08-24T17:16:53"}, "after_values": {"null": "2023-10-06T14:36:24"}}
this case
Can you let me know how can i reproduce this case?
Sorry for the confusion. This data is not an example of a JSON column, but a general case.
It seems that the problem about MINIMAL you mentioned is correct, and what I said is that the key of before_values and after_values should be c, not None.
In the previous version 0.45.1, it appears that columns were tracked even if metadata=MINIMAL.
If you re-run the test case you showed with a different version for the same binlog position, I expect the column to have a value other than None.
@jongwony
Now i understand this problem!
# row_event.py
for i in range(0, nb_columns):
column = self.columns[i]
name = self.table_map[self.table_id].columns[i].name
unsigned = self.table_map[self.table_id].columns[i].unsigned
values[name] = self.__read_values_name( <--- Problem
column,
null_bitmap,
null_bitmap_index,
cols_bitmap,
unsigned,
i,
)
we don't know column_name so it is null
python dict values[name] key null always overwrite value object!
so You just find only last order object Value about update_time
so If you change binlog_row_metadata Minimal => FULL
it will resolve the Issue.
binlog_row_metadata is not a valid variable for MySQL 5.7.
binlog_row_metadata is not a valid variable for MySQL 5.7.
Yes so we support 0.45.1 version for mysql 5.7 version