TeskaLabs/cysimdjson

TypeError: expected bytes, str found

petric3 opened this issue · 3 comments

Im not computer science guy (mechanical eng.), so hopefully the question is not too off. My project involves streaming millions of data (speed is very important) via websocket client delivered as string messages containing a system's operational parameters in json/dict format. One pushed message is a list (actual list in string) containing several dictionaries through which we iterate and analyse in real time. A 3 abbreviated separately pushed messages would look sth like this:

[{"t":1611064750942,"Signal003":"Q","Signal003_MODE":1,"Signal003_TARGET":11,"Signal003_NEXT":8,"Signal003_IRV":9.05,"Signal003_ONOFF":9.06,"Signal003_TEMP":5,"Signal001_PRES":98, ,"Signal001_ENT":3}]
[{"t":1611064750943,"Signal033":"T","Signal003_MODE":1,"Signal003_TARGET":5170,"Signal001_ONOFF":19,"Signal001_TEMP":91.28,"s":6445,"Signal033_IRV":[12],"Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":2,"Signal003_TARGET":5171,"Signal001_ONOFF":8,"Signal001_TEMP":9.04,"s":100,"Signal033_IRV":[12],"t":1611064750943,"Signal003_ENT":3}]
[{"t":1611064750943,"Signal065":"T","Signal003_MODE":3,"Signal003_TARGET":5172,"Signal001_ONOFF":8,"Signal001_TEMP":9.04,"s":1000,"Signal033_IRV":[12],"Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":1,"Signal003_TARGET":5173,"Signal001_ONOFF":8,"Signal001_TEMP":9.04,"s":100,"Signal033_IRV":[12],"t":1611064750943,"Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":1,"Signal003_TARGET":7116,"Signal001_ONOFF":12,"Signal001_TEMP":9.04,"s":10,"Signal033_IRV":[14,12,37,41],"t":1611064750943,Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":1,"Signal003_TARGET":961,"Signal001_ONOFF":19,"Signal001_TEMP":9.04,"s":4,"Signal033_IRV":[14,12,37,41],"t":1611064750943,Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":1,"Signal003_TARGET":962,"Signal001_ONOFF":19,"Signal001_TEMP":9.04,"s":10,"Signal033_IRV":[14,12,37,41],"t":1611064750943,"Signal003_ENT":3}]

On your frontpage, I saw the example and if understood it correctly, the message should be in byte format, hence I understand the TypeError Im getting:

json_bytes = b'''
{
  "foo": [1,2,[3]]
}
'''

Would it be possible to parse a string message within the cysimdjson library as we have no influence on the type of the pushed message from the websocket? I hope the question is not too off, but since websockets libraries in conjunction with json formats are widely used, I was thinking the problem might be worth looking into. Also, I can imagine that one could do the string conversion somehow in python, but C speed would probably be affected. At the moment, I use orjson, which is pretty fast and working well, but looking at your results, it ignites the interest. Regardless of the answer, thank you for the efforts with the library.

I see. Can you please also provide the whole error output you are getting.
We intentionally avoid any unnecessary type checks since it costs a performance but if this is doable in e.g. exception handler, it could be added easily.

Thanks for the reply. I completely understand the avoidance of type checks, so I don't expect too much, but maybe there's some work around or suggestion, thanks in advance. Here goes the error:
Traceback (most recent call last): File "/home/oem/Syncthing/Projects/Pump_diagnostics_TWU21/JSN_TEST/json_study.py", line 66, in <module> benchmark0(F"{'cysimdjson_parser':>15}", parser.parse) File "/home/oem/Syncthing/Projects/Pump_diagnostics_TWU21/JSN_TEST/json_study.py", line 50, in benchmark0 jsn = loads(m) File "cysimdjson/cysimdjson.pyx", line 359, in cysimdjson.JSONParser.parse TypeError: expected bytes, str found

First test trial with parser.parse_string() works well, so the issue is fit to be closed. Then will test it thoroughly in coming week. Thank you, very much appreciated!