Failure parsing binary VDF fields with invalid UTF-8 strings
Closed this issue · 1 comments
vdf fails to parse binary VDF files which contain fields with invalid UTF-8. One such invalid field was discovered in an entry for Spore inside appinfo.vdf. The exception that occurs is:
File "/home/matoking/git/protontricks/env/lib/python3.7/site-packages/vdf/__init__.py", line 337, in binary_loads
stack[-1][key], idx = read_string(s, idx)
File "/home/matoking/git/protontricks/env/lib/python3.7/site-packages/vdf/__init__.py", line 305, in read_string
result = result.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 12: invalid start byte
I've uploaded a snippet that can be used to reproduce this error here:
https://gist.github.com/Matoking/e2dfe281386ff4eac9022eb0f02d80cd
SteamDB seems to replace the invalid character with a question mark:
https://steamdb.info/app/24720/history/ (the faulty string here is Moje Spore v�tvory)
A similar fix in vdf would mean replacing the result.decode('utf-8')
call with result.decode('utf-8', errors='replace')
or letting the developer decide how to handle errors by passing an optional errors
kwarg.
Yeah, PICS has these issue with malformed data. Valve doesn't seem to validate it properly on their end, and even truncate utf-8 in the middle of glyphs. The best approach would be to use replace on errors as there is not much a dev can do handle this.