A python package of log parsers with benchmarks for log template/event extraction
If you use these parsers, please cite our paper using the following reference:
@Conference{He16DSN,
Title = {An Evaluation Study on Log Parsing and Its Use in Log Mining},
Author = {He, P. and Zhu, J. and He, S. and Li, J. and Lyu, M. R.},
Booktitle = {DSN'16: Proc. of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks},
Year = {2016}
}
If you are not familiar with log parser, please check the Principles of Parsers
The codes are here.
- SLCT (Simple Logfile Clustering Tool): A Data Clustering Algorithm for Mining Patterns from Event Logs (SLCT is wrapped around on the C source code provided by the author.)
- IPLoM (Iterative Partitioning Log Mining): A Lightweight Algorithm for Message Type Extraction in System Application Logs
- LKE (Log Key Extraction): Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis
- LogSig: LogSig: Gnerating System Events from Raw Textual Logs
- Drain: Drain: An Online Log Parsing Approach with Fixed Depth Tree
- POP: a parallel log parsing method optimized on top of Spark.
In data, there are 5 datasets for you to play with. Each dataset contains several text files.
- rawlog.log: The raw log messages with ID. "ID\tword1 word2 word3"
- template[0-9]+: The log messages belong to a certain template.
- templates: The text of templates.
Input: A raw log file. Each line of the file follows "ID\tword1 word2 word3"
Output: Two parts. One is splitted log messages (only contains log ID) in different text files. The other is the templates file which contains all templates.
Examples: Before running the examples, please copy the parser source file to the same directory.
- Example1: This file is a simple example to demonstrate the usage of LogSig. The usage of other log parsers is similar.
- Example2: This file is to demonstrate the usage of POP.
- Example3: This file is used to evaluate the performance of LogSig. It iterates 10 times and record several important information (e.g., TP, FP, time). To play with your own dataset, you could modify the path and files name in the code. You should also modify the path for ground truth data in RI_precision. For the ground truth data format, you can refer to our provided datasets.
- Evaluation of LogSig: This folder provides a package for you to evaluate the LogSig log parser on 2k HDFS dataset. You could simply run the evaluateLogSig.py file.
For SLCT, because it is based on the original C code, the running example is here. This program is platform-dependent because the .so files are only valid in Linux.
Copyright © 2017, LogPAI, CUHK