Collection of books, research papers, videos and articles for mastering Site Realibility Engineer proficiency.
- Modern Operating Systems Tanenbaum, Andrew S.
- UNIX and Linux System Administration Handbook Nemeth, Evi
- TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the Unix (R) Domain Protocols Stevens, W. Richard
- Systems Performance: Enterprise and the Cloud
- Site Reliability Engineering: How Google Runs Production Systems - Free to read online(https://landing.google.com/sre/book/index.html)
- The datacenter as a computer: an introduction to the design of warehouse-scale machines
- The Practice of System and Network Administration
- The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems
- Time Management for System Administrators
- The Go Programming Language Donovan, Alan A. A.
- Think Python Downey, Allen B.
- The Linux Command Line Jr., William E. Shotts
- Linux Server Hacks: 100 Industrial-Strength Tips and Tools Flickenger, Rob
- Programming Pearls Bentley, Jon L.
- Web Operations - Keeping the Data On Time
- Microservices in Production
- Docker up and running
- Large-scale cluster management at Google with Borg
- MapReduce: simplified data processing on large clusters
- Bigtable: A Distributed Storage System for Structured Data
- On designing and deploying internet-scale services
- Mesos: a platform for fine-grained resource sharing in the data center
- Google: Reliable Cron across the Planet
- Aurora
- Docker
- Fluentd
- ElasticSearch
- GCE
- Hadoop
- Kubernetes
- Mesos
- Kernel Based Virtual Machine
- Protocol Buffers
- Spark
- VMWare
- Software engineering at Google
- Keys to SRE by Ben Treynor
- How Container Clusters Like Kubernetes Change Operations
- 10 Years of Crashing Google
- Release Engineering Best Practices at Google
- From Zero to Hero: Recommended Practices for Training your Ever-Evolving SRE Teams
- Transactional System Administration Is Killing Us and Must be Stopped
- Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories
- [Netflix: 190 Countries and 5 CORE SREs] (https://www.usenix.org/conference/srecon16/program/presentation/horowitz)
- [Performance Checklists for SREs] (https://www.usenix.org/conference/srecon16/program/presentation/gregg)
- [Notes on SRE book] (http://danluu.com/google-sre-book/)
- [SYSADMIN (Un)Reliability Budgets] (https://www.usenix.org/system/files/login/articles/login_aug15_06_roth.pdf)