Logparser

News: Please check our new progress at: https://github.com/logpai/logparser/tree/dev

A python package of log parsers with benchmarks for log template/event extraction

Parsers

If you are not familiar with log parser, please check the Principles of Parsers
The codes are here.

SLCT (Simple Logfile Clustering Tool): A Data Clustering Algorithm for Mining Patterns from Event Logs (SLCT is wrapped around on the C source code provided by the author.)
IPLoM (Iterative Partitioning Log Mining): A Lightweight Algorithm for Message Type Extraction in System Application Logs
LKE (Log Key Extraction): Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis
LogSig: LogSig: Gnerating System Events from Raw Textual Logs
POP: Towards Automated Log Parsing for Large-Scale Log Data Analysis
DrainV1: Drain: An Online Log Parsing Approach with Fixed Depth Tree
Drain: A Directed Acyclic Graph Approach to Online Log Parsing, extended from DrainV1.

Data

In data, there are 11 datasets for you to play with. Each dataset contains several text files.

rawlog.log: The raw log messages with ID. "ID\tword1 word2 word3"
template[0-9]+: The log messages belong to a certain template.
templates: The text of templates.

Quick Start

Input: A raw log file. Each line of the file follows "ID\tword1 word2 word3"
Output: Two parts. One is splitted log messages (only contains log ID) in different text files. The other is the templates file which contains all templates.

Examples: Before running the examples, please copy the parser source file to the same directory.

Example1: This file is a simple example to demonstrate the usage of LogSig. The usage of other log parsers is similar.
Example2: This file is to demonstrate the usage of POP.
Example3: This file is used to evaluate the performance of LogSig. It iterates 10 times and record several important information (e.g., TP, FP, time). To play with your own dataset, you could modify the path and files name in the code. You should also modify the path for ground truth data in RI_precision. For the ground truth data format, you can refer to our provided datasets.
Evaluation of LogSig: This folder provides a package for you to evaluate the LogSig log parser on 2k HDFS dataset. You could simply run the evaluateLogSig.py file.

For SLCT, because it is based on the original C code, the running example is here. This program is platform-dependent because the .so files are only valid in Linux.

Paper

If you use these parsers, please cite our paper using the following reference:

@Inproceedings{He16DSN,
Title = {An Evaluation Study on Log Parsing and Its Use in Log Mining},
Author = {He, P. and Zhu, J. and He, S. and Li, J. and Lyu, M. R.},
Booktitle = {DSN'16: Proc. of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks},
Year = {2016}
}

You are also welcome to cite our other related log parser papers:

@Inproceedings{He17ICWS,
Title = {Drain: An Online Log Parsing Approach with Fixed Depth Tree},
Author = {He, P. and Zhu, J. and Zheng, Z. and Lyu, M. R.},
Booktitle = {ICWS'17: Proc. of the 24th International Conference on Web Services},
Year = {2017}
}

@Article{HeTDSC17,
Title = {Towards Automated Log Parsing for Large-Scale Log Data Analysis},
Author = {He, P. and Zhu, J. and He, S. and Li, J. and Lyu, M. R.},
Booktitle = {IEEE Transactions on Dependable and Secure Computing},
doi={10.1109/TDSC.2017.2762673},
ISSN={1545-5971}
}

License

The MIT License (MIT)

phypor/logparser