reliability-engineering
There are 143 repositories under reliability-engineering topic.
dastergon/awesome-sre
A curated list of Site Reliability and Production Engineering resources.
litmuschaos/litmus
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
bregman-arie/sre-checklist
A checklist of anyone practicing Site Reliability Engineering
awslabs/aws-well-architected-labs
Hands on labs and code to help you learn, measure, and build using architectural best practices.
chaostoolkit/chaostoolkit
Chaos Engineering Toolkit & Orchestration for Developers
SquadcastHub/awesome-sre-tools
A curated list of Site Reliability and Production Engineering Tools
Azure/Mission-Critical
This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.
MatthewReid854/reliability
Reliability engineering toolkit for Python - https://reliability.readthedocs.io/en/latest/
artilleryio/chaos-lambda
Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥
mikeroyal/OpenShift-Guide
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
rakhimov/scram
Probabilistic Risk Analysis Tool (fault tree analysis, event tree analysis, etc.)
zeroc0d3lab/awesome-sre
A curated list of awesome Site Reliability and Production Engineering resources.
grafana/k6-docs
The k6 documentation website.
alphagov/paas-cf
GOV.UK PaaS - Cloud Foundry
chaostoolkit/chaostoolkit-lib
The Chaos Toolkit core library
gremlin/sre-tools
A collection of SRE tools
theodesp/stable-systems-checklist
An opinionated list of attributes and policies that need to be met in order to establish a stable software system.
alphagov/terraform-provider-concourse
A terraform provider for Concourse
derrynknife/SurPyval
A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.
dastergon/sreworkbook-templates-md
A collection templates ported from the SRE Workbook
alphagov/puppet-aptly
No longer maintained: Puppet module for aptly
alphagov/gsp
GSP is a container platform and curated suite of components helping government deploy, run, observe and secure their services
chiragnagpal/deep_cox_mixtures
Code for the paper "Deep Cox Mixtures for Survival Regression", Machine Learning for Healthcare Conference 2021
nobl9/terraform-provider-nobl9
Terraform provider for Nobl9
alphagov/paas-billing
A Go application for generating billing data from cloudfoundry events
alphagov/prometheus-aws-configuration-beta
Terraform configuration to manage a Prometheus server running on AWS.
phamquiluan/awesome-failure-diagnosis
Related resources for incident failure diagnosis research.
alphagov/paas-admin
Administration tool for GOV.UK PaaS
alphagov/paas-aiven-broker
A service broker to provide Aiven Elasticsearch and InfluxDB services to Cloud Foundry users
last9/last9-integrations
Sample applications of supported integrations by Last9 Products
zeroc0d3lab/awesome-scalability
:bookmark: Daily-updated reading list for designing High Scalability :cherries:, High Availability :fire:, High Stability :mount_fuji: back-end systems - Pull requests are greatly welcome :two_men_holding_hands: I hope you will find this project helpful :four_leaf_clover: Please help me share it to more and more people :heart: Thank you - 谢谢 - धन्यवाद - ধন্যবাদ - Спасибо - شكرا - Merci - Gracias - Danke - Cảm ơn! :bow:
alphagov/paas-bootstrap
Bootstrap a VPC with BOSH and Concourse to run PaaS
alphagov/paas-tech-docs
Technical documentation for GOV.UK PaaS
alphagov/reliability-engineering
Documentation for Reliability Engineering services