reliability-engineering

There are 152 repositories under reliability-engineering topic.

  • litmus

    litmus

    Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q

    Language:Go4.5k
  • sre-checklist

    sre-checklist

    A checklist of anyone practicing Site Reliability Engineering

  • aws-well-architected-labs

    Hands on labs and code to help you learn, measure, and build using architectural best practices.

    Language:Python2k
  • chaostoolkit

    Chaos Engineering Toolkit & Orchestration for Developers

    Language:Python1.9k
  • awesome-sre-tools

    A curated list of Site Reliability and Production Engineering Tools

  • Mission-Critical

    This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.

  • reliability

    Reliability engineering toolkit for Python - https://reliability.readthedocs.io/en/latest/

    Language:Python345
  • chaos-lambda

    Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥

    Language:JavaScript290
  • OpenShift-Guide

    OpenShift-Guide

    OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.

    Language:Python149
  • scram

    Probabilistic Risk Analysis Tool (fault tree analysis, event tree analysis, etc.)

    Language:C++141
  • awesome-sre

    A curated list of awesome Site Reliability and Production Engineering resources.

  • k6-docs

    The k6 documentation website.

    Language:JavaScript89
  • paas-cf

    GOV.UK PaaS - Cloud Foundry

    Language:Go82
  • chaostoolkit-lib

    The Chaos Toolkit core library

    Language:Python76
  • sre-tools

    A collection of SRE tools

  • stable-systems-checklist

    An opinionated list of attributes and policies that need to be met in order to establish a stable software system.

  • terraform-provider-concourse

    A terraform provider for Concourse

    Language:Go50
  • SurPyval

    A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.

    Language:Python49
  • sreworkbook-templates-md

    A collection templates ported from the SRE Workbook

  • puppet-aptly

    No longer maintained: Puppet module for aptly

    Language:Ruby33
  • gsp

    GSP is a container platform and curated suite of components helping government deploy, run, observe and secure their services

    Language:Go31
  • deep_cox_mixtures

    Code for the paper "Deep Cox Mixtures for Survival Regression", Machine Learning for Healthcare Conference 2021

  • awesome-failure-diagnosis

    Related resources for incident failure diagnosis research.

  • terraform-provider-nobl9

    Terraform provider for Nobl9

    Language:Go25
  • prometheus-aws-configuration-beta

    Terraform configuration to manage a Prometheus server running on AWS.

    Language:HCL23
  • paas-billing

    A Go application for generating billing data from cloudfoundry events

    Language:Go23
  • paas-admin

    Administration tool for GOV.UK PaaS

    Language:TypeScript18
  • paas-aiven-broker

    A service broker to provide Aiven Elasticsearch and InfluxDB services to Cloud Foundry users

    Language:Go16
  • last9-integrations

    Sample applications of supported integrations by Last9 Products

    Language:Python14
  • awesome-scalability

    :bookmark: Daily-updated reading list for designing High Scalability :cherries:, High Availability :fire:, High Stability :mount_fuji: back-end systems - Pull requests are greatly welcome :two_men_holding_hands: I hope you will find this project helpful :four_leaf_clover: Please help me share it to more and more people :heart: Thank you - 谢谢 - धन्यवाद - ধন্যবাদ - Спасибо - شكرا - Merci - Gracias - Danke - Cảm ơn! :bow:

  • paas-bootstrap

    Bootstrap a VPC with BOSH and Concourse to run PaaS

    Language:Ruby13
  • paas-tech-docs

    Technical documentation for GOV.UK PaaS

    Language:HTML12
  • reliability-engineering

    Documentation for Reliability Engineering services

    Language:HTML11
  • reliability.re

    reliability.re

    Reliability Report - A collaborative curated content site about Reliability Engineering

    Language:Python9