/logparser

A toolkit for automated log parsing [ICSE'19, TDSC'18, ICWS'17, DSN'16]

Primary LanguagePythonMIT LicenseMIT

Logparser

Documentation Status license

Logparser provides a toolkit and benchmarks for automated log parsing, which is a crucial step towards structured log analytics. By applying logparser, users can automatically learn event templates from unstructured logs and convert raw log messages into a sequence of structured events. In the literature, the process of log parsing is sometimes refered to as message template extraction, log key extraction, or log message clustering.


An illustrative example of log parsing

👉 Read the docs: https://logparser.readthedocs.io

🔭 If you use any of our tools or benchmarks in your research for publication, please kindly cite the following papers.

Log parsers currently available:

Tools References
SLCT [IPOM'03] A Data Clustering Algorithm for Mining Patterns from Event Logs, by Risto Vaarandi.
AEL [QSIC'08] Abstracting Execution Logs to Execution Events for Enterprise Applications, by Zhen Ming Jiang, Ahmed E. Hassan, Parminder Flora, Gilbert Hamann.
[JSME'08] An Automated Approach for Abstracting Execution Logs to Execution Events, by Zhen Ming Jiang, Ahmed E. Hassan, Gilbert Hamann, Parminder Flora.
IPLoM [KDD'09] Clustering Event Logs Using Iterative Partitioning, by Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios.
[TKDE'12] A Lightweight Algorithm for Message Type Extraction in System Application Logs, by Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios.
LKE [ICDM'09] Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis, by Qiang Fu, Jian-Guang Lou, Yi Wang, Jiang Li. [Microsoft]
LFA [MSR'10] Abstracting Log Lines to Log Event Types for Mining Software System Logs, by Meiyappan Nagappan, Mladen A. Vouk.
LogSig [CIKM'11] LogSig: Generating System Events from Raw Textual Logs, by Liang Tang, Tao Li, Chang-Shing Perng.
SHISO [SCC'13] Incremental Mining of System Log Format, by Masayoshi Mizutani.
LogCluster [CNSM'15] LogCluster - A Data Clustering and Pattern Mining Algorithm for Event Logs, by Risto Vaarandi, Mauno Pihelgas.
LenMa [CNSM'15] Length Matters: Clustering System Log Messages using Length of Words, by Keiichi Shima.
LogMine [CIKM'16] LogMine: Fast Pattern Recognition for Log Analytics, by Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Geoff Jiang, Adbullah Mueen. [NEC]
Spell [ICDM'16] Spell: Streaming Parsing of System Event Logs, by Min Du, Feifei Li.
Drain [ICWS'17] Drain: An Online Log Parsing Approach with Fixed Depth Tree, by Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu.
IBM-Drain3: IBM's upgrade version of Drain in Python 3.6 with additional features.
MoLFI [ICPC'18] A Search-based Approach for Accurate Identification of Log Message Formats, by Salma Messaoudi, Annibale Panichella, Domenico Bianculli, Lionel Briand, Raimondas Sasnauskas.

Get started

Code organization:

  • benchmark: the benchmark scripts to reproduce the evaluation results of log parsing
  • demo: the demo files to show how to run logparser on HDFS logs.
  • logparser: the logparser package
  • logs: Some log samples and manually parsed structured logs with their templates (ground truth).

Please follow the installation steps and demo in the docs to get started.

Benchmarking results

All the log parsers have been evaluated across 16 different logs available in loghub. We report parsing accuracy as the percentage of accurately parsed log messages. To reproduce the experimental results, please run the benchmark scripts.

👇 Check the detailed bechmarking result table (click to expand)

In the table, accuracy values above 0.9 are marked in bold, and the best accuracy results achieved are marked with *. Some of the accuracy values may be lower than what have been reported by previous studies (e.g., Drain, LogMine). The reasons are two-fold: 1) We use a more rigorous accuracy metric which rejects events that are only partially matched. 2) For fairness of comparison, we apply only a few preprocessing regular expressions (e.g., IP or number replacement) to each log parser. Adding more preprocessing rules can boost parsing accuracy, but requires more manual efforts as well.

Publications about logparser

Publications using logparser

Year Conference Paper Title Code
2023 ICSE Van-Hoang Le, Hongyu Zhang. Log Parsing with Prompt-based Few-shot Learning Link
2023 ICSE Zhenhao Li, Chuan Luo, Tse-Hsun Chen, Weiyi Shang, Shilin He, Qingwei Lin, Dongmei Zhang. Did We Miss Something Important? Studying and Exploring Variable-Aware Log Abstraction
2023 ICSE Yintong Huo, Yuxin Su, Cheryl Lee, Michael R. Lyu. SemParser: A Semantic Parser for Log Analysis Link
2023 IEEE Transaction on Severice Computing Siyu Yu, Pinjia He, Ningjiang Chen, Yifan Wu. Brain: Log Parsing with Bidirectional Parallel Tree Link
2023 IEEE Transactions on Network and Service Management Xiao T, Quan Z, Wang Z J, et al. LPV: A Log Parsing Framework Based on Vectorization
2023 WWW Liming Wang, Hong Xie, Ye Li, Jian Tan, John C.S. Lui. Interactive Log Parsing via Light-weight User Feedback
2023 TKDE Zhang T, Qiu H, Castellano G, et al. System Log Parsing: A Survey. IEEE Transactions on Knowledge and Data Engineering, 2023.
2022 ICSME I. Sedki, A. Hamou-Lhadj, O. Ait-Mohamed, M. Shehab. An Effective Approach for Parsing Large Log Files Link
2022 FSE (best paper) Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, et al. SPINE: a scalable log parser with feedback guidance
2022 WWW Liu Y, Zhang X, He S, et al. Uniparser: A unified log parser for heterogeneous log data
2021 ICDE Chu G, Wang J, Qi Q, et al. Prefix-Graph: A Versatile Log Parsing Approach Merging Prefix Tree with Probabilistic Graph
2020 TSE Dai H, Li H, Chen C S, et al. Logram: Efficient Log Parsing Using n-Gram Dictionaries Link
2020 PKDD Nedelkoski S, Bogatinovski J, Acker A, et al. https://arxiv.org/pdf/2003.07905

Acknowledgement

Logparser is implemented based on a number of existing open-source projects:

Feedback

For any questions or feedback, please post to the issue page.