parsed templates

Question

parsed templates

HankKung opened this issue 5 years ago · 1 comments

As mentioned in LogAnomaly:
The front 50% (according to the timestamps of logs) of the BGL dataset is used as the training set, which includes 257 log templates, and the rest 50% involving 503 templates is used as the testing set.

However, I got 1834 templates from Drain and 3000+ from Spell. Did I do something wrong here? Or should I filter templates that occurs one time out?

Answer 1 · 2020-09-30T14:35:18.000Z

In the anomaly detection paper, the authors usually will use the ground truth (correct parsing results). While for existing parsers, it's possible to have parsing errors. For example, the one you mentioned. It could be caused by wrongly understanding a few constants.