williballenthin/python-evtx

Evtx.BinaryParser.ParseException raised unexpectedly

john-corcoran opened this issue · 7 comments

Hi,

I'm seeing exception Evtx.BinaryParser.ParseException raised unexpectedly when parsing event log "Microsoft-Windows-Windows Firewall With Advanced Security%4Firewall.evtx" (extracted from a Windows Server 2012 testing instance) using evtx_dump.py.

Same behaviour seen when parsing on Windows with Python 2.7, and Ubuntu with Python 2.7 / 3.5.

Traceback from Windows with Python 2.7 as follows:

Traceback (most recent call last):
  File "evtx_dump.py", line 42, in <module>
    main()
  File "evtx_dump.py", line 37, in main
    print(record.xml())
  File "C:\Python27\lib\site-packages\Evtx\Evtx.py", line 481, in xml
    return e_views.evtx_record_xml_view(self)
  File "C:\Python27\lib\site-packages\Evtx\Views.py", line 204, in evtx_record_xml_view
    return render_root_node(record.root())
  File "C:\Python27\lib\site-packages\Evtx\Views.py", line 182, in render_root_node
    for sub in root_node.substitutions():
  File "C:\Python27\lib\site-packages\Evtx\BinaryParser.py", line 64, in __call__
    cache[key] = self.func(*args, **kw)
  File "C:\Python27\lib\site-packages\Evtx\Nodes.py", line 1019, in substitutions
    raise ParseException("Invalid substitution value size")
Evtx.BinaryParser.ParseException: Parse Exception(Invalid substitution value size)

I can forward the EVTX file in question if that assists?

hey @john-corcoran

Thanks for raising this issue. The easiest way to triage it will be to share the .evtx file. I'd be happy to take a peek if you send to my email: willi.ballenthin@gmail.com

@john-corcoran just wanted to let you know i still have my eye on this issue. been doing some traveling so haven't had as much time as i'd have liked to triage the bug. the evtx files you provided should be enough for me to figure things out in the next few days.

I suspect this evtx file is corrupt. For example, when I simply print the record numbers, we get output like:

[...]
180
181
182
183
184
185
186
187
188
7999193089
[exception traceback]

clearly, 7999193089 is not a valid record number. I think expected behavior would be to identify this corruption earlier, and then notify the user that something is wrong (rather than trying to interpret invalid data).

evtx_info.py also fails:

λ C:\Users\user\env2\Scripts\python ..\scripts\evtx_info.py data\issue_45.evtx
Information from file header:
Format version  : 3.1
Flags           : 0x00000000
File is         : clean
Log is full     : no
Current chunk   : 2 of 3
Oldest chunk    : 1
Next record#    : 189
Check sum       : pass

Information from chunks:
  Chunk file (first/last)     log (first/last)      Header Data
- ----- --------------------- --------------------- ------ ------
>     1          1        66           1        66   pass   pass
      2         67       123          67       123   pass   pass
*     3        124       188         124       188   pass   pass
Traceback (most recent call last):
  File "..\scripts\evtx_info.py", line 109, in <module>
    main()
  File "..\scripts\evtx_info.py", line 82, in main
    if not chunk.check_magic():
  File "c:\users\user\documents\code\python-evtx\Evtx\Evtx.py", line 308, in check_magic
    return self.magic() == "ElfChnk\x00"
  File "c:\users\user\documents\code\python-evtx\Evtx\BinaryParser.py", line 211, in explicit_length_handler
    return f(offset, length)
  File "c:\users\user\documents\code\python-evtx\Evtx\BinaryParser.py", line 475, in unpack_string
    return self.unpack_binary(offset, length).decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa1 in position 2: ordinal not in range(128)

seems like this file contains junk data after 3 valid chunks:

λ C:\Users\user\env2\Scripts\python ..\scripts\evtx_info.py data\issue_45.evtx
Information from file header:
Format version  : 3.1
Flags           : 0x00000000
File is         : clean
Log is full     : no
Current chunk   : 2 of 3
Oldest chunk    : 1
Next record#    : 189
Check sum       : pass

Information from chunks:
  Chunk file (first/last)     log (first/last)      Header Data
- ----- --------------------- --------------------- ------ ------
>     1          1        66           1        66   pass   pass
      2         67       123          67       123   pass   pass
*     3        124       188         124       188   pass   pass
      4   [INVALID]
      5   [INVALID]
      6   [INVALID]
      7   [INVALID]
      8   [INVALID]
      9   [INVALID]
     10   [INVALID]
     11   [INVALID]
     12   [INVALID]
     13   [INVALID]
     14   [INVALID]
     15   [INVALID]
     16   [INVALID]

commits 5d2754f and 1ed29cf improve the robustness of the code that parses chunks.

i think the issue is that when enumerating chunks (and therefore records) via the Evtx log object, the log object doesn't take into account how many chunks are declared to be in the log file. Instead, it keeps reading until the end of the file. So, if there's junk data appended to the file, it attempts to parse this as valid data. See here:

def chunks(self):

confirmed the theory above, and addressed the issue in df8edf4.

thanks to @john-corcoran for reporting this issue!