site-reliability-engineering
There are 98 repositories under site-reliability-engineering topic.
dastergon/awesome-sre
A curated list of Site Reliability and Production Engineering resources.
upgundecha/howtheysre
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
chaos-mesh/chaos-mesh
A Chaos Engineering Platform for Kubernetes.
dastergon/awesome-chaos-engineering
A curated list of Chaos Engineering resources.
chaosblade-io/chaosblade
An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)
litmuschaos/litmus
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
alexei-led/pumba
Chaos testing, network emulation, and stress testing tool for containers
dastergon/postmortem-templates
A collection of postmortem templates
SquadcastHub/awesome-sre-tools
A curated list of Site Reliability and Production Engineering Tools
jaegertracing/jaeger-ui
Web UI for Jaeger
opslane/opslane
Making on-call suck less for engineers
mister0/How-to-prepare-for-google-interview-SWE-SRE
This repository includes resources which are more than sufficient to prepare for google interview if you are applying for a software engineer position or a site reliability engineer position
robusta-dev/holmesgpt
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
chris-short/DevOps-README.md
What to Read to Learn More About DevOps
rishiloyola/SRE-Interviews
Curated list of good SRE interview questions.
traas-stack/chaosmeta
A chaos engineering platform for supporting the complete fault drill lifecycle.
vespperhq/vespper
Open-source AI copilot that lets you chat with your observability data and code 🧙♂️
dastergon/wheel-of-misfortune
A role-playing game for incident management training
chiaen/sre-book-in-audio
Google Site Reliability Engineering book converted in audio
mikeroyal/OpenShift-Guide
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
devopness/devopness
Devopness - Essential DevOps: to everyone
danrl/skinny
The Skinny Distributed Lock Service
dastergon/CardsAgainstReliability
A party card game for engineers caring about reliability. Based on Cards Against Humanity.
zeroc0d3lab/awesome-sre
A curated list of awesome Site Reliability and Production Engineering resources.
exajobs/devops-collection
Welcome To The World of DevOps. An ongoing & curated collection of awesome software, libraries, learning tutorials, tools and resources and cool stuff about DevOps.
dastergon/availability-calculator
Calculate how much downtime should be permitted in your Service Level Agreement or Objective
krootee/awesome-scalability-toolbox
My opinionated list of products and tools used for high-scalability projects
gremlin/sre-tools
A collection of SRE tools
marceloboeira/sre
📚 Index for my study topics
QAInsights/Performance-Engineers-DevOps
This repository helps performance testers and engineers who wants to dive into DevOps and SRE world.
dastergon/sreworkbook-templates-md
A collection templates ported from the SRE Workbook
dastergon/common-disaster-recovery-scenarios
A list of common Disaster Recovery (DR) scenarios for software companies
flanksource/sre-learning-resources
A curated list of resources designed to level up as an SRE engineer
exajobs/sre-collection
An ongoing & curated collection of awesome SRE software and tools, libraries and frameworks, engineering books and blogs, philosophical principles, technical guidelines, practical tools about the field of Site Reliablity Engineering (SRE)
phamquiluan/baro
[FSE'24 - 🏆 Best Artifact Award] BARO: Robust Root Cause Analysis for Microservice Systems.
woojiahao/interviews
A collection of my resources for studying for SWE/SRE interviews!