Site Reliability Engineer guide

Collection of books, research papers, videos and articles for mastering Site Reliability Engineer proficiency.

Books

  • Modern Operating Systems Tanenbaum, Andrew S.
  • UNIX and Linux System Administration Handbook Nemeth, Evi
  • TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the Unix (R) Domain Protocols Stevens, W. Richard
  • Systems Performance: Enterprise and the Cloud
  • Site Reliability Engineering: How Google Runs Production Systems - Free to read online(https://landing.google.com/sre/book/index.html)
  • The Site Reliability Workbook
  • The datacenter as a computer: an introduction to the design of warehouse-scale machines
  • The Practice of System and Network Administration
  • The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems
  • Time Management for System Administrators
  • The Go Programming Language Donovan, Alan A. A.
  • Think Python Downey, Allen B.
  • The Linux Command Line Jr., William E. Shotts
  • Linux Server Hacks: 100 Industrial-Strength Tips and Tools Flickenger, Rob
  • Programming Pearls Bentley, Jon L.
  • Web Operations - Keeping the Data On Time
  • Microservices in Production
  • Docker up and running
  • Kubernetes Up and Running By Brendan Burns, Kelsey Hightower, Joe Beda

Research papers

Technologies

Networking

Monitoring and alerting

SRE best practice

Trainings

More