Site Reliability Engineer guide

đź“šCollection of books, research papers, videos and articles for mastering Site Reliability Engineer proficiency.

Books

SRE

Kubernetes platform and applications

  • Docker up and running
  • Kubernetes Up and Running By Brendan Burns, Kelsey Hightower, Joe Beda
  • Microservices in Production
  • Designing Data-Intensive Applications
  • Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services - Free to download
  • Software Engineering at Google - Free to download

Compute, Networking and Storage - theory and practice

  • Modern Operating Systems Tanenbaum, Andrew S.
  • UNIX and Linux System Administration Handbook Nemeth, Evi
  • TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the Unix (R) Domain Protocols Stevens, W. Richard
  • Systems Performance: Enterprise and the Cloud
  • The datacenter as a computer: an introduction to the design of warehouse-scale machines
  • The Practice of System and Network Administration
  • The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems
  • Linux Server Hacks: 100 Industrial-Strength Tips and Tools Flickenger, Rob
  • Web Operations - Keeping the Data On Time

Programming

  • The Linux Command Line Jr., William E. Shotts
  • Shell Scripting: How to Automate Command Line Tasks Using Bash Scripting and Shell Programming
  • The Go Programming Language Donovan, Alan A. A.
  • Think Python Downey, Allen B.
  • Programming Pearls Bentley, Jon L.
  • Code Complete 2, Steve McConnell

Other

  • Time Management for System Administrators

Research papers

Technologies

SRE best practice

Trainings

Conferences