utf decode error in unpack_wstring
atcuno opened this issue · 1 comments
atcuno commented
Traceback (most recent call last):
File "/usr/local/bin/evtx_dump.py", line 4, in <module>
__import__('pkg_resources').run_script('python-evtx==0.6.1', 'evtx_dump.py')
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 739, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1501, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/python_evtx-0.6.1-py2.7.egg/EGG-INFO/scripts/evtx_dump.py", line 42, in <module>
File "/usr/local/lib/python2.7/dist-packages/python_evtx-0.6.1-py2.7.egg/EGG-INFO/scripts/evtx_dump.py", line 37, in main
File "build/bdist.linux-x86_64/egg/Evtx/Evtx.py", line 498, in xml
File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 204, in evtx_record_xml_view
File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 191, in render_root_node
File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 176, in render_root_node_with_subs
File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 126, in rec
File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 166, in rec
File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 191, in render_root_node
File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 175, in render_root_node_with_subs
File "build/bdist.linux-x86_64/egg/Evtx/BinaryParser.py", line 64, in __call__
File "build/bdist.linux-x86_64/egg/Evtx/Nodes.py", line 168, in children
File "build/bdist.linux-x86_64/egg/Evtx/Nodes.py", line 153, in _children
File "build/bdist.linux-x86_64/egg/Evtx/Nodes.py", line 733, in __init__
File "build/bdist.linux-x86_64/egg/Evtx/BinaryParser.py", line 493, in unpack_wstring
File "/usr/lib/python2.7/encodings/utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 900-901: illegal UTF-16 surrogate
jdeloshoyos commented
I'm using this script a lot, with great results, and have also encountered this problem when converting some logs that contain illegal characters in its data for whatever reason. Here's a quick and dirty fix that did the trick for me:
In Evtx/Nodes.py:
def string(self):
binary = self.binary()
acc = []
while len(binary) > 0:
match = re.search(b"((?:[^\x00].)+)", binary)
if match:
frag = match.group()
acc.append("<string>")
# Begin change: add try/except block for handling illegal characters
try:
acc.append(frag.decode("utf16"))
except:
acc.append("[ILLEGAL CHARACTER]")
# End change
acc.append("</string>\n")
binary = binary[len(frag) + 2:]
if len(binary) == 0:
break
frag = re.search(b"(\x00*)", binary).group()
if len(frag) % 2 == 0:
for _ in range(len(frag) // 2):
acc.append("<string></string>\n")
else:
raise ParseException("Error parsing uneven substring of NULLs")
binary = binary[len(frag):]
return "".join(acc)
Of course, the "[ILLEGAL CHARACTER]" string could be something shorter.