Drain: Linux log parsing length
KenV1040 opened this issue · 1 comments
KenV1040 commented
Hello authors,
I'm currently trying to parse the Linux dataset that you sent me (the full version of the log file, not the 2k version) and I'm encountering an error when trying to use the Drain_demo.py file on it. I didn't change anything in the Drain_demo other than the log format to match the Linux one and I still encounter this error.
This is the traceback:
Traceback (most recent call last):
File "main.py", line 84, in <module>
main()
File "main.py", line 41, in main
pre_processing()
File "main.py", line 48, in pre_processing
unstructured_to_structured()
File "main.py", line 80, in unstructured_to_structured
parser.parse(log_file)
File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 255, in parse
self.load_data()
File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 291, in load_data
self.df_log = self.log_to_dataframe(os.path.join(self.path, self.logName), regex, headers, self.log_format)
File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 304, in log_to_dataframe
for line in fin.readlines():
File "/home/kupperu/.pyenv/versions/3.6.9/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 4536: invalid start byte
How would I go about fixing this? Apologies if this is a basic error, I'm quite new in this field.
Also I'm currently using python 3.6.9.
Thanks
PinjiaHe commented
Sorry for the late reply. Maybe you could try with Python 2.7.
Researchers from IBM also provide an implementation on Python 3.6: https://github.com/IBM/Drain3