Websocket: "cannot decode". Error validating utf8
radomir9720 opened this issue · 8 comments
My websocket connection is constantly being cut off.
I turned on debug pybit and websocket logs to see what's wrong.
Here is what i've got:
2024-06-15 16:44:16,150 - websocket - DEBUG - Sending ping
2024-06-15 16:44:36,151 - websocket - DEBUG - Sending ping
2024-06-15 16:44:56,152 - websocket - DEBUG - Sending ping
2024-06-15 16:44:56,725 - websocket - ERROR - error from callback <function _WebSocketManager._connect.<locals>.<lambda> at 0x7fc65f4a56c0>: cannot decode: b'{"success":true,"ret_msg":"pong","conn_id":"1a598878-132b-4e\xf17-8864-8cee95ea1b85","req_id":"","op":"ping"}'
2024-06-15 16:44:56,726 - websocket - INFO - tearing down on exception cannot decode: b'{"success":true,"ret_msg":"pong","conn_id":"1a598878-132b-4e\xf17-8864-8cee95ea1b85","req_id":"","op":"ping"}'
2024-06-15 16:44:56,726 - pybit._websocket_stream - DEBUG - WebSocket Unified V5 closed.
I took a look inside websocket
package. The exception is thrown from websocket._abnf.py.continuous_frame.extract()
method.
if (
not self.fire_cont_frame
and data[0] == ABNF.OPCODE_TEXT
and not self.skip_utf8_validation
and not validate_utf8(frame.data)
):
raise WebSocketPayloadException(f"cannot decode: {repr(frame.data)}")
So, there could be several reasons why the exception is thrown:
Disclaimer: i know that if
statement is executed until first True
, but, since i'm not familiar with implementation details of the websocket
package, i started with the part that i can figure out - utf8 validation:
As we can see, the validation fails because of the \x
escape sequence.
What can we do with it?
- Somehow change the connection id, so that it will not contain special characters
- Disable utf8 validation. Especially knowing the fact that it will increase the performance. So, passing
skip_utf8_validation=True
toWebSocketApp.run_forever()
method should solve the issue
P.S. I'm using the lates pybit version - 5.7.0
Forked and made changes described above.
Currently testing. If everything's gonna be ok - i will make a PR.
I don't think there should be an escape sequence in there, it's just a UUID and should be alphanumeric with hyphens. This makes me wonder if the conn_id
sent by the websocket server is the problem. Can you consistently reproduce this issue?
You're right. The issue is not the conn_id
. I did some investigations, and, i suppose, the problem is the unstable connection. Perhaps, once in a while bytes get lost, that results in a broken message.
Here are some examples:
- Instead of
kline.1.FTMUSDT
gotkline.1.DTMUSDT
. An exception was raised because there was no such topic in the topics dict.
2024-06-20 10:34:21,623 - websocket - DEBUG - ++Rcv raw: b'\x81~\x01\x16{"topic":"kline.1.DTMUSDT","data":[{"start":1718868840000,"end":1718868899999,"interval":"1","open":"0.5702","close":"0.57","high":"0.5704","low":"0.5695","volume":"139423","turnover":"79456.8583","confirm":false,"timestamp":1718868861523}],"ts":1718868861523,"type":"snapshot"}'
2024-06-20 10:34:21,624 - websocket - DEBUG - ++Rcv decoded: fin=1 opcode=1 data=b'{"topic":"kline.1.DTMUSDT","data":[{"start":1718868840000,"end":1718868899999,"interval":"1","open":"0.5702","close":"0.57","high":"0.5704","low":"0.5695","volume":"139423","turnover":"79456.8583","confirm":false,"timestamp":1718868861523}],"ts":1718868861523,"type":"snapshot"}'
2024-06-20 10:34:21,624 - websocket - ERROR - error from callback <function _WebSocketManager._connect.<locals>.<lambda> at 0x7f7c2c31b910>: 'kline.1.DTMUSDT'
- The entire
end
parameter is messed up. Also there is a missing quote at the end, that caused a JsonDecodeError.
Since we don't have options to check the integrity of the message(for example, through check sum), the only thing we can do - ignore broken messages. But. also, we don't have a universal way to check whether the message is broken or not(for cases like message with topic we didn't subscribe to). The only thing we can do - ignore the exceptions, and keep the connection open. This is what i did. I added an argument disconnect_on_exception
. If it's set to False
, the exception will only be logged, and the connection will be maintained.
I pushed all the changes to my fork, and gonna make a PR with details what i did, and why.
I'll do it tomorrow
I'm not sure how to reproduce this issue, do you think there's a good way to do it for testing purposes?
What do you want to reproduce? A broken message from websocket?
I don't think that there is a good way. I just run the socket stream, and getting broken messages(Therefore, a JSONDecodeError
) 1-2 times a day in average. But i'm using my fork, so the connection is maintained.
What is the ultimate goal in reproducing the issue? You can see the logs, that i attached, all you'll get after reproducing the issue - exactly the same logs, and the same result - JSONDecodeError
.
I've never had this issue myself and never seen it reported, so I'd like to be able to test the fix before committing to master.
I have used pybit to run tests for days at a time but have never seen this problem. So I just want to be able to reproduce it before approving the PR.
This PR, actually, is not a fix. It just provides a way to handle the error in _on_error
callback, instead of automatically raising an exception and closing the connection. To do so, it's necessary to pass skip_utf8_validation=True
. Also, the exceptions in _on_error
can be ignored by passing disconnect_on_exception=False
.
So changes are backward compatible, because by default the behaviour in the PR is completely the same as in current version.
If you want to test the behaviour when a broken message is received from websocket, you can, for example, override/mock/change the _on_message
method, and raise any exception in it, because exactly message = json.loads(message)
from _on_message
method throws an exception when the broken message can't be decoded.