/resilience-for-software

Introduction to resilience engineering concepts for software engineers

resilience-for-software

The intent of this document is to provide a lightweight introduction to and motivate further engagement with resilience engineering through a “narrative FAQ”.

What's the return on investing time in learning about resilience engineering?

If failures of complex systems (e.g., incidents) are substantially impacting the sustainability of your systems, the happiness of your engineers, your ability to meet business needs, and/or the happiness of your customers (framing borrowed from Honeycomb), resilience engineering may offer insight into which approaches to address these issues will be effective and high leverage.

Where do I start? What is resilience engineering?

For definitions and to gain a basic familiarity with the domain, we recommend you start by reading How Complex Systems Fail by Richard Cook, and then watch Resilience Engineering: The What and How by John Allspaw.

What is the relationship between resilience engineering and DevOps/SRE?

This is a complex question without a lightweight answer. We recommend reading the preface and conclusion chapters (of course, more if you're inspired) of Accelerate by Nicole Forsgren PhD, Jez Humble, and Gene Kim and Sustainable Operations in Complex Systems with Production Excellence by Liz Fong-Jones, and comparing the perspectives there with those of Cook and Allspaw expressed in the resources linked above.

I’m intrigued, what’s next?

If you enjoy reading academic papers, see Lorin Hochstein’s paper-centric introduction. If you prefer conference talks, John Allspaw has curated a YouTube playlist. Nora Jones also runs a resilience engineering focused Slack Community, Learning From Incidents in Software.

What should I do if I have other questions?

Your options include

  1. Open an issue on this repo
  2. Reaching out to Lorin Hochstein, Jacob Scott, or others in the resilience engineering community on Twitter
  3. Contacting Allspaw, Cook, and Wood’s consultancy, Adaptive Capacity Labs, if you want to work with subject matter experts in a professional/contractual setting