julien-duponchelle/python-mysql-replication

Json length is larger than packet length

jongwony opened this issue · 6 comments

environment: MySQL 8.0
python: 3.11.1

Hexdump

# at 41168404
#230302 17:55:52 server id 62903779  end_log_pos 41169433 CRC32 0x92e2e089 
# Position  Timestamp   Type   Master ID        Size      Master Pos    Flags 
# 02742e14 98 64 00 64   1f   e3 d5 bf 03   05 04 00 00   19 32 74 02   00 00
# 02742e27 62 00 00 00 00 00 01 00  02 00 06 ff ff 00 01 00 |b...............|
# 02742e37 00 00 18 5a 35 6a 6f 57  4b 52 49 41 4c 31 38 44 |...Z5joWKRIAL18D|
# 02742e47 74 49 52 70 76 73 50 55  38 6f 6d 30 00 4e 33 64 |tIRpvsPU8om0.N3d|
# 02742e57 5a 4a 39 64 6d 34 72 64  48 52 41 52 4a 6c 43 6b |ZJ9dm4rdHRARJlCk|
# 02742e67 79 31 4a 6c 36 56 37 6b  59 77 34 66 34 53 59 37 |y1Jl6V7kYw4f4SY7|
# 02742e77 4b 53 39 30 57 47 53 74  7a 37 36 41 62 a3 01 00 |KS90WGStz76Ab...|
# 02742e87 00 00 05 00 a2 01 3a 00  0a 00 44 00 0b 00 4f 00 |..........D...O.|
# 02742e97 0b 00 5a 00 0d 00 75 00  1a 00 0c 9e 00 0c b5 00 |..Z...u.........|
# 02742ea7 02 bb 00 02 f3 00 0c 8f  01 00 02 f3 00 0c 8f 01 |................|
# 02742eb7 00 02 83 01 0c 8f 01 73  63 6f 70 65 63 6c 69 65 |.......scopeclie|
# 02742ec7 6e 74 5f 75 72 69 63 6c  69 65 6e 74 5f 6e 61 6d |nt.uriclient.nam|
# 02742ed7 65 67 72 61 6e 74 5f 74  79 70 65 73 72 65 64 69 |egrant.typesredi|
# 02742ee7 72 65 63 74 5f 75 72 69  73 72 65 73 70 6f 6e 73 |rect.urisrespons|
# 02742ef7 65 5f 74 79 70 65 73 74  6f 6b 65 6e 5f 65 6e 64 |e.typestoken.end|
# 02742f07 70 6f 69 6e 74 5f 61 75  74 68 5f 6d 65 74 68 6f |point.auth.metho|
# 02742f17 64 0e 6f 70 65 6e 69 64  20 70 72 6f 66 69 6c 65 |d.openid.profile|
# 02742f27 16 68 74 74 70 73 3a 2f  2f 69 64 2e 77 61 6e 74 |.https...id.want|
# 02742f37 65 64 2e 6a 6f 62 73 05  6f 6e 65 69 64 05 00 38 |ed.jobs.oneid..8|
# 02742f47 00 0c 13 00 0c 1c 00 0c  22 00 0c 29 00 0c 32 00 |..............2.|
# 02742f57 08 70 61 73 73 77 6f 72  64 05 6b 61 6b 61 6f 06 |.password.kakao.|
# 02742f67 67 6f 6f 67 6c 65 08 66  61 63 65 62 6f 6f 6b 05 |google.facebook.|
# 02742f77 61 70 70 6c 65 04 00 90  00 0c 10 00 0c 2b 00 0c |apple...........|
# 02742f87 48 00 0c 6b 00 1a 68 74  74 70 73 3a 2f 2f 69 64 |H..k..https...id|
# 02742f97 2d 61 70 69 2e 77 61 6e  74 65 64 2e 6a 6f 62 73 |.api.wanted.jobs|
# 02742fa7 1c 68 74 74 70 73 3a 2f  2f 69 64 2d 61 64 6d 69 |.https...id.admi|
# 02742fb7 6e 2e 77 61 6e 74 65 64  2e 6a 6f 62 73 22 68 74 |n.wanted.jobs.ht|
# 02742fc7 74 70 73 3a 2f 2f 69 64  2d 61 70 69 2d 77 77 77 |tps...id.api.www|
# 02742fd7 74 65 73 74 2e 77 61 6e  74 65 64 2e 6a 6f 62 73 |test.wanted.jobs|
# 02742fe7 24 68 74 74 70 73 3a 2f  2f 69 64 2d 61 64 6d 69 |.https...id.admi|
# 02742ff7 6e 2d 77 77 77 74 65 73  74 2e 77 61 6e 74 65 64 |n.wwwtest.wanted|
# 02743007 2e 6a 6f 62 73 01 00 0c  00 0c 07 00 04 63 6f 64 |.jobs........cod|
# 02743017 65 12 63 6c 69 65 6e 74  5f 73 65 63 72 65 74 5f |e.client.secret.|
# 02743027 70 6f 73 74 99 ad 28 58  2a 99 ae aa e5 42 00 01 |post...X.....B..|
# 02743037 00 00 00 18 5a 35 6a 6f  57 4b 52 49 41 4c 31 38 |....Z5joWKRIAL18|
# 02743047 44 74 49 52 70 76 73 50  55 38 6f 6d 30 00 4e 33 |DtIRpvsPU8om0.N3|
# 02743057 64 5a 4a 39 64 6d 34 72  64 48 52 41 52 4a 6c 43 |dZJ9dm4rdHRARJlC|
# 02743067 6b 79 31 4a 6c 36 56 37  6b 59 77 34 66 34 53 59 |ky1Jl6V7kYw4f4SY|
# 02743077 37 4b 53 39 30 57 47 53  74 7a 37 36 41 62 82 01 |7KS90WGStz76Ab..|
# 02743087 00 00 00 06 00 81 01 2e  00 0a 00 38 00 0b 00 43 |...........8...C|
# 02743097 00 0b 00 4e 00 0d 00 5b  00 10 00 6b 00 1a 00 0c |...N.......k....|
# 027430a7 85 00 0c 9c 00 02 a2 00  02 da 00 02 6a 01 0c 6e |............j..n|
# 027430b7 01 63 6c 69 65 6e 74 5f  75 72 69 63 6c 69 65 6e |.client.uriclien|
# 027430c7 74 5f 6e 61 6d 65 67 72  61 6e 74 5f 74 79 70 65 |t.namegrant.type|
# 027430d7 73 72 65 64 69 72 65 63  74 5f 75 72 69 73 75 70 |sredirect.urisup|
# 027430e7 64 61 74 65 5f 75 73 65  72 5f 75 72 69 73 74 6f |date.user.uristo|
# 027430f7 6b 65 6e 5f 65 6e 64 70  6f 69 6e 74 5f 61 75 74 |ken.endpoint.aut|
# 02743107 68 5f 6d 65 74 68 6f 64  16 68 74 74 70 73 3a 2f |h.method.https..|
# 02743117 2f 69 64 2e 77 61 6e 74  65 64 2e 6a 6f 62 73 05 |.id.wanted.jobs.|
# 02743127 6f 6e 65 69 64 05 00 38  00 0c 13 00 0c 1c 00 0c |oneid..8........|
# 02743137 22 00 0c 29 00 0c 32 00  08 70 61 73 73 77 6f 72 |......2..passwor|
# 02743147 64 05 6b 61 6b 61 6f 06  67 6f 6f 67 6c 65 08 66 |d.kakao.google.f|
# 02743157 61 63 65 62 6f 6f 6b 05  61 70 70 6c 65 04 00 90 |acebook.apple...|
# 02743167 00 0c 10 00 0c 2b 00 0c  48 00 0c 6b 00 1a 68 74 |........H..k..ht|
# 02743177 74 70 73 3a 2f 2f 69 64  2d 61 70 69 2e 77 61 6e |tps...id.api.wan|
# 02743187 74 65 64 2e 6a 6f 62 73  1c 68 74 74 70 73 3a 2f |ted.jobs.https..|
# 02743197 2f 69 64 2d 61 64 6d 69  6e 2e 77 61 6e 74 65 64 |.id.admin.wanted|
# 027431a7 2e 6a 6f 62 73 22 68 74  74 70 73 3a 2f 2f 69 64 |.jobs.https...id|
# 027431b7 2d 61 70 69 2d 77 77 77  74 65 73 74 2e 77 61 6e |.api.wwwtest.wan|
# 027431c7 74 65 64 2e 6a 6f 62 73  24 68 74 74 70 73 3a 2f |ted.jobs.https..|
# 027431d7 2f 69 64 2d 61 64 6d 69  6e 2d 77 77 77 74 65 73 |.id.admin.wwwtes|
# 027431e7 74 2e 77 61 6e 74 65 64  2e 6a 6f 62 73 00 00 04 |t.wanted.jobs...|
# 027431f7 00 12 63 6c 69 65 6e 74  5f 73 65 63 72 65 74 5f |..client.secret.|
# 02743207 70 6f 73 74 99 ad 28 58  2a 99 ae aa e5 42 89 e0 |post...X.....B..|
# 02743217 e2 92                                            |..|
# 	Update_rows: table id 98 flags: STMT_END_F
### UPDATE `oneid`.`service_client`
### WHERE
###   @1=1
###   @2='Z5joWKRIAL18DtIRpvsPU8om'
###   @3='N3dZJ9dm4rdHRARJlCky1Jl6V7kYw4f4SY7KS90WGStz76Ab'
###   @4='{"client_uri": "https://id.wanted.jobs", "client_name": "oneid", "grant_types": ["password", "kakao", "google", "facebook", "apple"], "redirect_uris": ["https://id-api.wanted.jobs", "https://id-admin.wanted.jobs", "https://id-api-wwwtest.wanted.jobs", "https://id-admin-wwwtest.wanted.jobs"], "token_endpoint_auth_method": "client_secret_post"}'
###   @5='2022-06-20 05:32:42'
###   @6='2022-12-21 14:21:02'
### SET
###   @1=1
###   @2='Z5joWKRIAL18DtIRpvsPU8om'
###   @3='N3dZJ9dm4rdHRARJlCky1Jl6V7kYw4f4SY7KS90WGStz76Ab'
###   @4='{"client_uri": "https://id.wanted.jobs", "client_name": "oneid", "grant_types": ["password", "kakao", "google", "facebook", "apple"], "redirect_uris": ["https://id-api.wanted.jobs", "https://id-admin.wanted.jobs", "https://id-api-wwwtest.wanted.jobs", "https://id-admin-wwwtest.wanted.jobs"], "update_user_uris": [], "token_endpoint_auth_method": "client_secret_post"}'
###   @5='2022-06-20 05:32:42'
###   @6='2022-12-21 14:21:02'
ROLLBACK /* added by mysqlbinlog */ /*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Above hexdump data @4 is very long JSON type value, I think the error occurs because it doesn't properly recognize the large JSON.

Traceback

File ~/Downloads/github/data_weaver/cdc_producer/__main__.py:143, in KafkaConnector.produce(self)
    140     continue
    142 data = []
--> 143 for row in event.rows:
    144     # intersection update keys
    145     if event.event_type == const.UPDATE_ROWS_EVENT_V2:
    146         diff_keys = json.dumps(one_depth_xor(row['before_values'], row['after_values']), ensure_ascii=False)

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/row_event.py:459, in RowsEvent.rows(self)
    456 @property
    457 def rows(self):
    458     if self.__rows is None:
--> 459         self._fetch_rows()
    460     return self.__rows

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/row_event.py:454, in RowsEvent._fetch_rows(self)
    451     return
    453 while self.packet.read_bytes < self.event_size:
--> 454     self.__rows.append(self._fetch_one_row())

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/row_event.py:543, in UpdateRowsEvent._fetch_one_row(self)
    540 def _fetch_one_row(self):
    541     row = {}
--> 543     row["before_values"] = self._read_column_data(self.columns_present_bitmap)
    545     row["after_values"] = self._read_column_data(self.columns_present_bitmap2)
    546     return row

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/row_event.py:207, in RowsEvent._read_column_data(self, cols_bitmap)
    204     values[name] = self.packet.read_length_coded_pascal_string(
    205         column.length_size)
    206 elif column.type == FIELD_TYPE.JSON:
--> 207     values[name] = self.packet.read_binary_json(column.length_size)
    208 else:
    209     raise NotImplementedError("Unknown MySQL column type: %d" %
    210                               (column.type))

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:359, in BinLogPacketWrapper.read_binary_json(self, size)
    356 self.unread(payload)
    357 t = self.read_uint8()
--> 359 return self.read_binary_json_type(t, length)

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
size = 4
ipdb> p payload
b'\x00\x05\x00\xa2\x01:\x00\n\x00D\x00\x0b\x00O\x00\x0b\x00Z\x00\r\x00u\x00\x1a\x00\x0c\x9e\x00\x0c\xb5\x00\x02\xbb\x00\x02\xf3\x00\x0c\x8f\x01\x00\x02\xf3\x00\x0c\x8f\x01\x00\x02\x83\x01\x0c\x8f\x01scopeclient_uriclient_namegrant_typesredirect_urisresponse_typestoken_endpoint_auth_method\x0eopenid profile\x16https://id.wanted.jobs\x05oneid\x05\x008\x00\x0c\x13\x00\x0c\x1c\x00\x0c"\x00\x0c)\x00\x0c2\x00\x08password\x05kakao\x06google\x08facebook\x05apple\x04\x00\x90\x00\x0c\x10\x00\x0c+\x00\x0cH\x00\x0ck\x00\x1ahttps://id-api.wanted.jobs\x1chttps://id-admin.wanted.jobs"https://id-api-wwwtest.wanted.jobs$https://id-admin-wwwtest.wanted.jobs\x01\x00\x0c\x00\x0c\x07\x00\x04code\x12client_secret_post'

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:364, in BinLogPacketWrapper.read_binary_json_type(self, t, length)
    362 large = (t in (JSONB_TYPE_LARGE_OBJECT, JSONB_TYPE_LARGE_ARRAY))
    363 if t in (JSONB_TYPE_SMALL_OBJECT, JSONB_TYPE_LARGE_OBJECT):
--> 364     return self.read_binary_json_object(length - 1, large)
    365 elif t in (JSONB_TYPE_SMALL_ARRAY, JSONB_TYPE_LARGE_ARRAY):
    366     return self.read_binary_json_array(length - 1, large)

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
t = 0
length = 419

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:447, in BinLogPacketWrapper.read_binary_json_object(self, length, large)
    445     else:
    446         t = value_type_inlined_lengths[i][0]
--> 447         data = self.read_binary_json_type(t, length)
    448     out[keys[i]] = data
    450 return out

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
length = 418
large = False
ipdb> p out
{b'\x00\x02\xf3\x00\x0c\x8f\x01\x00\x02\x83': b'_typestoken_endpoint_auth_method\x0eopenid profile\x16https://id.wanted.jobs\x05oneid\x05\x008\x00\x0c\x13\x00\x0c\x1c\x00\x0c"\x00\x0c)\x00\x0c2\x00\x08passw', b'\x01\x0c\x8f\x01scopecl': b'rd\x05kakao\x06google\x08facebook\x05apple\x04\x00\x90\x00\x0c\x10\x00\x0c+\x00\x0cH\x00\x0ck\x00\x1ahttps://id-api.wanted.jobs\x1chttps://id-admin.wanted.jobs"https://'}

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:366, in BinLogPacketWrapper.read_binary_json_type(self, t, length)
    364     return self.read_binary_json_object(length - 1, large)
    365 elif t in (JSONB_TYPE_SMALL_ARRAY, JSONB_TYPE_LARGE_ARRAY):
--> 366     return self.read_binary_json_array(length - 1, large)
    367 elif t in (JSONB_TYPE_STRING,):
    368     return self.read_variable_length_string()

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
t = 2
length = 418

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:461, in BinLogPacketWrapper.read_binary_json_array(self, length, large)
    458     size = self.read_uint16()
    460 if size > length:
--> 461     raise ValueError('Json length is larger than packet length')
    463 values_type_offset_inline = [
    464     read_offset_or_inline(self, large)
    465     for _ in range(elements)]
    467 def _read(x):

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
length = 417
large = False

ValueError: Json length is larger than packet length

I am working hard on debugging myself with ipdb debugger., but I would like to share it would be nice to get help in understanding the hexdump.

Looking at out , it seems that the parsing is not going well.

ipdb> p out
{b'\x00\x02\xf3\x00\x0c\x8f\x01\x00\x02\x83': b'_typestoken_endpoint_auth_method\x0eopenid profile\x16https://id.wanted.jobs\x05oneid\x05\x008\x00\x0c\x13\x00\x0c\x1c\x00\x0c"\x00\x0c)\x00\x0c2\x00\x08passw', b'\x01\x0c\x8f\x01scopecl': b'rd\x05kakao\x06google\x08facebook\x05apple\x04\x00\x90\x00\x0c\x10\x00\x0c+\x00\x0cH\x00\x0ck\x00\x1ahttps://id-api.wanted.jobs\x1chttps://id-admin.wanted.jobs"https://'}

I've been running this environment for quite some time. For small JSON s this was fine.

I have the same problem

i have the same problem too.
What i've found out about this issue is so far is below:
def read_binary_json_object(self, length, large) function from packet.py is used when reading json objects. There is no problem with small json object. But when reading a long json object, a weird unknown key named 'response_types' is inserted, ruining the BinLogPacketWrapper class's __data_buffer value. Thus the offset does not match the buffer making it impossible to read correctly.

in my case, #403 , no error is returning to program, but cpu goes up to 100% for the all time.
also, i have very long json data in database fields.

have no idea if there is any connection with this issue.
maybe can check it btw.

@nankingrider I checked your issue. This issue is not long json value, is a very rare case. In my binlog dump case, MySQL 8.0 are jump the junk data. but read_binary_json_type method are conflict the junk data. I think MySQL binlog is handle the junk(maybe memory eviction...?), so we need also add handler based of #182

@jongwony Hi , thanks for the reply.!

  1. i'm using mysql 5.7.
  2. in py-bt, i can see the program looping in the READ function, with data(not junk data) correct from my records, which is json type, 300k elements, 12 bytes per element ( ip address ), total length 576Kb
  3. i tried to batch the op sql upstream to shrink the json length, works fine ( but not very perfect solution... )

a reasonable guess is maybe mysql generate junk data when json type field is too long ...

still trying to figure how to fix it.

@jongwony @channss @xufei4987
hello?
I'm pretty not sure This Pr resolve this Issue
Can you check this Issue in this PR?

I need your Many test cases.
IF error goes on, Please let me know how to reproduce this issue.