site-reliability-engineering
There are 101 repositories under site-reliability-engineering topic.
dastergon/awesome-sre
A curated list of Site Reliability and Production Engineering resources.
upgundecha/howtheysre
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
chaos-mesh/chaos-mesh
A Chaos Engineering Platform for Kubernetes.
dastergon/awesome-chaos-engineering
A curated list of Chaos Engineering resources.
chaosblade-io/chaosblade
An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)
litmuschaos/litmus
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
alexei-led/pumba
Chaos testing, network emulation, and stress testing tool for containers
dastergon/postmortem-templates
A collection of postmortem templates
SquadcastHub/awesome-sre-tools
A curated list of Site Reliability and Production Engineering Tools
jaegertracing/jaeger-ui
Web UI for Jaeger
robusta-dev/holmesgpt
Your 24/7 On-Call AI Agent - Solve Alerts Faster with Automatic Correlations, Investigations, and More
opslane/opslane
Making on-call suck less for engineers
mister0/How-to-prepare-for-google-interview-SWE-SRE
This repository includes resources which are more than sufficient to prepare for google interview if you are applying for a software engineer position or a site reliability engineer position
chris-short/DevOps-README.md
What to Read to Learn More About DevOps
rishiloyola/SRE-Interviews
Curated list of good SRE interview questions.
vespperhq/vespper
Open-source AI copilot that lets you chat with your observability data and code 🧙♂️
traas-stack/chaosmeta
A chaos engineering platform for supporting the complete fault drill lifecycle.
devopness/devopness
DevOps Happiness: for AI Agents & Humans. Deploy apps and infra to any cloud, in minutes. Fast, simple, cloud-native 🚀
dastergon/wheel-of-misfortune
A role-playing game for incident management training
chiaen/sre-book-in-audio
Google Site Reliability Engineering book converted in audio
mikeroyal/OpenShift-Guide
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
danrl/skinny
The Skinny Distributed Lock Service
dastergon/CardsAgainstReliability
A party card game for engineers caring about reliability. Based on Cards Against Humanity.
zeroc0d3lab/awesome-sre
A curated list of awesome Site Reliability and Production Engineering resources.
exajobs/devops-collection
Welcome To The World of DevOps. An ongoing & curated collection of awesome software, libraries, learning tutorials, tools and resources and cool stuff about DevOps.
krootee/awesome-scalability-toolbox
My opinionated list of products and tools used for high-scalability projects
dastergon/availability-calculator
Calculate how much downtime should be permitted in your Service Level Agreement or Objective
marceloboeira/sre
📚 Index for my study topics
phamquiluan/RCAEval
[ASE'24][WWW'25] RCAEval: A Benchmark for Root Cause Analysis. https://doi.org/10.1145/3691620.3695065
gremlin/sre-tools
A collection of SRE tools
QAInsights/Performance-Engineers-DevOps
This repository helps performance testers and engineers who wants to dive into DevOps and SRE world.
phamquiluan/baro
[FSE'24 - 🏆 Best Artifact Award] BARO: Robust Root Cause Analysis for Microservice Systems.
dastergon/sreworkbook-templates-md
A collection templates ported from the SRE Workbook
dastergon/common-disaster-recovery-scenarios
A list of common Disaster Recovery (DR) scenarios for software companies
flanksource/sre-learning-resources
A curated list of resources designed to level up as an SRE engineer
exajobs/sre-collection
An ongoing & curated collection of awesome SRE software and tools, libraries and frameworks, engineering books and blogs, philosophical principles, technical guidelines, practical tools about the field of Site Reliablity Engineering (SRE)