A collection of papers about Root Cause Analysis/Diagnosis/Localization in MicroService Systems, including invocation chain, multi-dimensional metrics and machine metrics.
Reference of paper notes: https://dreamhomes.top/
- [2018 TSE] Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study. paper
Note: Different methods categoried by data type.
- [2013 SIGMETRICS] Root Cause Detection in a Service-Oriented Architecture [MonitorRank]. paper
- [2015 IWQoS] A methodology for root-cause analysis in component based systems. paper
- [2017 TPDS] Failure Diagnosis for Distributed Systems Using Targeted Fault Injection. paper
- [2018 IWQoS] Root Cause Analysis of Anomalies of Multitier Services in Public Clouds. paper
- [2018 CCGRID] CloudRanger: Root Cause Identification for Cloud Native Systems. paper
- [2018 ASE] Delta debugging microservice systems. paper
- [2019 TSC] Microservices Monitoring with Event Logs and Black Box Execution Tracing. paper
- [2019 Access] A Real-Time Trace-Level Root-Cause Diagnosis System in Alibaba Datacenters. paper
- [2020 JSS] Graph-based root cause analysis for service-oriented and microservice architectures. paper
- [2016 KDD] Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations. paper
- [2017 ICDM] Ranking Causal Anomalies by Modeling Local Propagations on Networked Systems. paper
- [2018 CCGRID] CloudRanger: Root Cause Identification for Cloud Native Systems. paper
- [2018 ICST] Localizing Faults in Cloud Systems. paper
- [2018 IPCCC] FacGraph: Frequent Anomaly Correlation Graph Mining for Root Cause Diagnose in Micro-Service Architecture. paper
- [2019 ICWS] MS-Rank: Multi-Metric and Self-Adaptive Root Cause Diagnosis for Microservice Applications. paper
- [2020 Appl. Sci.] A Causality Mining and Knowledge Graph Based Method of Root Cause Diagnosis for Performance Anomaly in Cloud Applications. paper
- [2020 WWW] AutoMAP: Diagnose Your Microservice-based Web Applications Automatically. paper
- [2020 IWQoS] Localizing Failure Root Causes in a Microservice through Causality Inference. paper
- [2021 ICSE] MicroHECL: High-Efficient Root Cause Localization in Large-Scale Microservice Systems. paper
- [2021 ISSTA] Faster, Deeper, Easier: Crowdsourcing Diagnosis of Microservice Kernel Failure from User Space. paper
- [2021 SEKE] AAMR: Automated Anomalous Microservice Ranking in Cloud-Native Environment. paper
- [2017 WWW] Performance Monitoring and Root Cause Analysis for Cloud-hosted Web Applications. paper
- [2018 ICSOC] Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-service Environments. paper
- [2019 FSE] Latent Error Prediction and Fault Localization for Microservice Applications by Learning from System Trace Logs. paper
- [2019 ASE] Root Cause Localization for Unreproducible Builds via Causality Analysis over System Call Tracing. paper
- [2019 ASPLOS] Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices. paper
- [2020 MLArchSys] Sage: Leveraging ML To Diagnose Unpredictable Performance in Cloud Microservices. paper
- [2020 ISSRE] Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks. paper
- [2020 ESEC/FSE] Graph-Based Trace Analysis for Microservice Architecture Understanding and Problem Diagnosis. paper
- [2021 WWW] MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments. paper
- [2021 ICSE] TraceLingo: Trace representation and learning for performance issue diagnosis in cloud services. notes
- [2021 IWQoS] Practical Root Cause Localization for Microservice Systems via Trace Analysis. paper
- [2021 ASE] AID: Efficient Prediction of Aggregated Intensity of Dependency in Large-scale Cloud Systems. paper
Note:Graph data includes System State Graph, Dependency graph and so on.
- [2013 ICDCS] FChain: Toward Black-box Online Fault Localization for Cloud Systems. paper
- [2019 ICPADS] ADGS: Anomaly Detection and Localization based on Graph Similarity in Container-based Clouds. paper
- [2019 VLDB] GRANO: Interactive Graph-based Root Cause Analysis for Cloud-Native Distributed Data Platform. paper
- [2019 JSS] Graph-based root cause analysis for service-oriented and microservice architectures. peper
- [2020 NOMS] MicroRCA: Root Cause Localization of Performance Issues in Microservices. paper
- [2020 ICSOC] Localization of Operational Faults in Cloud Applications by Mining Causal Dependencies in Logs using Golden Signals. paper
- [2020 SoSE] Graph Based Root Cause Analysis in Cloud Data Center. paper
- [2021 ICSE] MicroDiag: Fine-grained Performance Diagnosis for Microservice Systems. paper
- [2021 ASE] Groot: An Event-graph-based Approach for Root Cause Analysis in Industrial Settings. paper
- [2019 ISSTA] DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization. paper
- [2019 ASE] Root Cause Localization for Unreproducible Builds via Causality Analysis over System Call Tracing. paper
- [2019 TSE] An Empirical Study of Boosting Spectrum-based Fault Localization via PageRank. paper
- [2020 AAAI] Control Flow Graph Embedding based on Multi-Instance Decomposition for Bug Localization. paper