logpai/logparser

Drain: Linux log parsing length

KenV1040 opened this issue · 1 comments

Hello authors,
I'm currently trying to parse the Linux dataset that you sent me (the full version of the log file, not the 2k version) and I'm encountering an error when trying to use the Drain_demo.py file on it. I didn't change anything in the Drain_demo other than the log format to match the Linux one and I still encounter this error.

This is the traceback:

Traceback (most recent call last):
  File "main.py", line 84, in <module>
    main()
  File "main.py", line 41, in main
    pre_processing()
  File "main.py", line 48, in pre_processing
    unstructured_to_structured()
  File "main.py", line 80, in unstructured_to_structured
    parser.parse(log_file)
  File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 255, in parse
    self.load_data()
  File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 291, in load_data
    self.df_log = self.log_to_dataframe(os.path.join(self.path, self.logName), regex, headers, self.log_format)
  File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 304, in log_to_dataframe
    for line in fin.readlines():
  File "/home/kupperu/.pyenv/versions/3.6.9/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 4536: invalid start byte

How would I go about fixing this? Apologies if this is a basic error, I'm quite new in this field.

Also I'm currently using python 3.6.9.

Thanks

Sorry for the late reply. Maybe you could try with Python 2.7.

Researchers from IBM also provide an implementation on Python 3.6: https://github.com/IBM/Drain3