Build To Manage
We all have been there before: Development throws their new code over the wall, and Operations has to figure out how to deploy, how to monitor, how to manage it. In the traditional world, development had time to build this knowledge. Applications were updated rarely, and once deployed the lifetime of the application could span years. As the velocity and speed of change increases, operations teams can become a bottleneck, resulting in either a decelerated release or an increase of operational risks.
Continued Deploy is a key theme in the cloud world, which means that Operations have significant less time to build the knowledge, and the opportunity to apply this knowledge is much shorter. Therefore, we need a different approach to management. Instead of Operations figuring out their tasks in isolation, Development provides information on how to manage the application. In DevOps, developers already took control of one important aspect of operations: Deployment and Release of the application. However, there are more things developers should do to ease operations.
As organizations are working on building out a sustainable culture, we recognize the need for some simple specific steps to follow to start getting some of the benefits in the short term. To this end, we are introducing a new approach to operations which we call Build to Manage. It specifies the practice of activities developers can do in order to instrument the application, or provide manageability aspects as part of an application release.
The “Build to Manage” approach includes the following aspects:
- HealthCheck API
- Log Format and Catalog
- Monitoring and metrics
- Deployment correlation
- Distributed Tracing
- Topology Information
- Event Format and Catalog
- Test Cases and Scripts
- Runbooks
- First Failure Data Capture
- Documentation
Sample code and links to demonstrate the principles outlined in the Build to Manage Point of View document
HealthAPI
- Python
- Node.js
- Ruby
Log management
- Java
- Node.js
Monitoring and metrics
- Node.js
- Java
Distributed tracing and logging
- Node.js
- Python
- Java (framework)
Open Tracing
OpenTracing is a distributed tracing instrumentation standard. It aims to standarize instrumentation, so developers can instrument first, and worry about the collection/distribution/aggregation system later. OpenTracing’s foundational concepts are Traces and Spans. Traces are the “story” of a transaction or workflow as it makes its way through a system. They are represented as directed acyclic graphs. A trace is made up of spans, which each represent one component of the story. Each trace starts with a span. Spans create new spans with two types of relationships that express both the semantics in the system (FollowFrom), and the critical path for latency-sensitive (distributed) operations (ChildOf).
Current languages with OpenTracing API libraries
- Go
- Python doesn’t support tracers yet, so unusable in production
- Javascript not sufficient by itself, and required library to make it work is not yet finished
- Objective-C
- Java requires explicit tracer instantiation
- C++ needs some work - this library uses C++98