reliability-engineering
There are 152 repositories under reliability-engineering topic.
litmus
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
sre-checklist
A checklist of anyone practicing Site Reliability Engineering
aws-well-architected-labs
Hands on labs and code to help you learn, measure, and build using architectural best practices.
chaostoolkit
Chaos Engineering Toolkit & Orchestration for Developers
awesome-sre-tools
A curated list of Site Reliability and Production Engineering Tools
Mission-Critical
This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.
reliability
Reliability engineering toolkit for Python - https://reliability.readthedocs.io/en/latest/
chaos-lambda
Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥
OpenShift-Guide
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
scram
Probabilistic Risk Analysis Tool (fault tree analysis, event tree analysis, etc.)
awesome-sre
A curated list of awesome Site Reliability and Production Engineering resources.
k6-docs
The k6 documentation website.
paas-cf
GOV.UK PaaS - Cloud Foundry
chaostoolkit-lib
The Chaos Toolkit core library
sre-tools
A collection of SRE tools
stable-systems-checklist
An opinionated list of attributes and policies that need to be met in order to establish a stable software system.
terraform-provider-concourse
A terraform provider for Concourse
SurPyval
A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.
sreworkbook-templates-md
A collection templates ported from the SRE Workbook
puppet-aptly
No longer maintained: Puppet module for aptly
gsp
GSP is a container platform and curated suite of components helping government deploy, run, observe and secure their services
deep_cox_mixtures
Code for the paper "Deep Cox Mixtures for Survival Regression", Machine Learning for Healthcare Conference 2021
awesome-failure-diagnosis
Related resources for incident failure diagnosis research.
terraform-provider-nobl9
Terraform provider for Nobl9
prometheus-aws-configuration-beta
Terraform configuration to manage a Prometheus server running on AWS.
paas-billing
A Go application for generating billing data from cloudfoundry events
paas-admin
Administration tool for GOV.UK PaaS
paas-aiven-broker
A service broker to provide Aiven Elasticsearch and InfluxDB services to Cloud Foundry users
last9-integrations
Sample applications of supported integrations by Last9 Products
awesome-scalability
:bookmark: Daily-updated reading list for designing High Scalability :cherries:, High Availability :fire:, High Stability :mount_fuji: back-end systems - Pull requests are greatly welcome :two_men_holding_hands: I hope you will find this project helpful :four_leaf_clover: Please help me share it to more and more people :heart: Thank you - 谢谢 - धन्यवाद - ধন্যবাদ - Спасибо - شكرا - Merci - Gracias - Danke - Cảm ơn! :bow:
paas-bootstrap
Bootstrap a VPC with BOSH and Concourse to run PaaS
paas-tech-docs
Technical documentation for GOV.UK PaaS
reliability-engineering
Documentation for Reliability Engineering services
reliability.re
Reliability Report - A collaborative curated content site about Reliability Engineering