/awesome-LLM-AIOps

A list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).

MIT LicenseMIT

Awesome LLM AIOps

A list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).

Content

Introduction

This is a list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).

Keywords Convention

The abbreviation of the work.

The utilized LLM techniques used in the work.

The mainly explored task of the work.

Other important information of the work.

1. LLM for Incident Management

1.1 Incident Lifecycle

  1. [HotNets 2023] A Holistic View of AI-driven Network Incident Management.

1.2 Incident Reporting

  1. [ESEC/FSE Industry 2023] Assess and Summarize: Improve Outage Understanding with Large Language Models.
  2. [ICSE-SEIP 2024] Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach.

1.3 Root Cause Analysis

  1. [ICSE 2024] Xpert: Empowering Incident Management with Query Recommendations via Large Language Models.
  2. [ICSE 2023] Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models.
  3. [Preprint 2023] PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis.
  4. [Preprint 2024] Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4.
  5. [Preprint 2024] X-lifecycle Learning for Cloud Incident Management using LLMs.
  6. [Preprint 2024] Exploring LLM-based Agents for Root Cause Analysis.
  7. [Preprint 2023] D-Bot: Database Diagnosis System using Large Language Models [project].
  8. [Preprint 2023] RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models.
  9. [Preprint 2024] mABC: Multi-Agent Blockchain-inspired Collaboration for Root Cause Analysis in Micro-Services Architecture.

1.4 Incident Mitigation

  1. [Preprint 2024] Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides.

1.5 Incident Postmortem Analysis

  1. [ICSE-SEIP 2024] FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems.
  2. [Preprint 2024] FAIL: Analyzing Software Failures from the News Using LLMs.

1.6 AIOps Question Answering

  1. [EMNLP Industry 2023] Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering [project].
  2. [ICLR 2024] OWL: A Large Language Model for IT Operations [project].
  3. [Preprint 2023] An Empirical Study of NetOps Capability of Pre-Trained Large Language Models [project].
  4. [Preprint 2023] OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models [project].

2. LLM for Log Analysis

2.1 Log Parsing

  1. [ISSTA 2024] A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?.
  2. [ASE-NIER 2023] Log Parsing: How Far Can ChatGPT Go? [project].
  3. [ICSE 2024] DivLog: Log Parsing with Prompt Enhanced In-Context Learning.
  4. [ICSE 2024] LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing.
  5. [FSE 2024] LILAC: Log Parsing using LLMs with Adaptive Parsing Cache.
  6. [ICPC 2024] Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies.
  7. [Preprint 2023] LEMUR : Log Parsing with Entropy Sampling and Chain-of-Thought Merging.
  8. [Preprint 2024] Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs.
  9. [Preprint 2024] ULog: Unsupervised Log Parsing with Large Language Models through Log Contrastive Units.
  10. [Preprint 2024] Log Parsing with Self-Generated In-Context Learning and Self-Correction.

2.2 Log Anomaly Detection

  1. [ICPC 2024] Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies.
  2. [Preprint 2023] Log-based Anomaly Detection based on EVT Theory with feedback.
  3. [Preprint 2023] LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection.
  4. [Preprint 2024] RAGLog: Log Anomaly Detection using Retrieval Augmented Generation.
  5. [Preprint 2024] Anomaly Detection on Unstable Logs with GPT Models.

2.3 Logging Statement Generation

  1. [ICSE 2024] UniLog: Automatic Logging via LLM and In-Context Learning.

  2. [FSE 2024] Go Static: Contextualized Logging Statement Generation.

  3. [Preprint 2023] Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study [project].

Contribution

Contributing to this paper list

  • First, think about which category the work should belong to.
  • Second, use the same format as the others to discribe the work. Note that there should be an empty line between the title and the authors list, and take care of the indentation.
  • Then, add keywords tags. Add the pdf link of the paper. If it is an arxiv publication, we prefer /abs/ format to /pdf/ format.

Don't worry if you put all these wrong, we will fix them for you. Just contribute and promote your awesome work here!

If you recommended a work that wasn't yours, you will be added to the contributor list (be sure to provide your information in other contributors).