Monitoring documentation

By GDS Reliability Engineering Team

Project description

This documentation gives an overview of work completed by the GDS Reliability Engineering Team. This is a living document and is in constant change during the discovery and exploration phases. This documentation will contain useful resources for teams setting up monitoring tools and for us to support them.

Quick overview

The solution is based on:

Index

  1. Documentation
  2. Guidance and best practises
  3. Dashboard templates
  4. Query examples
  5. Useful resources
  6. Exporter notes
  7. Alert Manager
  8. Diagrams
  9. Architecture Decision Records

Architecture Decision Records

We will record design decisions for the architecture to ensure we preserve the context of our choices. These will be written in the format proposed in a blog post by Michael Nygard

Please see the decisions directory for a list of all ADRs.

Tooling

We will use adr-tools to help manage the decisions.

brew install adr-tools

adr new 'Decision to record'

Please ensure that this tool is used at the root of the repository only.