Event-based logging helps to clarify operability of software systems by identifying possible failure modes and key execution events. This repo has examples and principles.
Copyright © 2018-2021 Conflux - Licenced under CC BY-SA 4.0
Event-based logging uses compile-time
enum
-based event definitions when logging to make runtime diagnosis and team-member onboarding clearer and easier.
When implemented with teams empowered to make useful improvements, event-based logging can:
- Increase software reliability through improved definition of software behavior
- Provide a common interface between developers and Ops / SRE / live service support, or between one team and other team
- Increase operational awareness in developers and Product Managers / Product Owners
- Explore software run-time behaviour without actually running the code
- Decrease "time-to-diagnose" for live incidents
- Clean up logging: reduce "logorrhoea" (verbose logs)
- Help to prepare for approaches like Domain-driven Design (DDD), Event-sourcing, and Chaos Engineering / Resilience Engineering
- Reduce onboarding time for new team members
Treat logging as a two-way communication channel between people building systems and people running systems; this could be two separate teams or it could be the same team at different times of the week.
Event-based logging is designed to be a simple, expressive approach to exploring failure modes and real-world operational behaviour for all kinds of software.
- Have a single, definitive list of application events in code
- Use exactly one of these event codes in the log message when logging
- Use an
enum
type or equivalent for compile-time checking of uniqueness and searchability - Get code-completion from the IDE or REPL when choosing an event to log
- Simple SHIFT-select or double-click on an event name to copy/paste into a log search tool - no manual selection of multiple words for copy/paste
- Avoid the need for a single cross-team events library by scoping events to specific services.
- Avoid the need for complex regex searches in log tools: just search for a single, guaranteed-unique string.
A key aim is to "lean on the compiler" for compile-time verification of the Event types when logging. This in turn means we get code-completion when choosing an Event type during logging:
// Nodejs example for event-based logging
const Events = Object.freeze({
UndefinedError : 'UndefinedError',
// Database events
DatabaseConnectionSuccess : 'DatabaseConnectionSuccess',
DatabaseConnectionFailure : 'DatabaseConnectionFailure',
DatabaseConnectionTimeout : 'DatabaseConnectionTimeout',
// Parsing events
ParseStreamUnexpectedToken : 'ParseStreamUnexpectedToken',
ParseStreamMissingData : 'ParseStreamMissingData',
ParseStreamSuccess : 'ParseStreamSuccess',
// Token validation events
TokenValidationSucceeded : 'TokenValidationSucceeded',
TokenValidationFailedInvalidParams : 'TokenValidationFailedInvalidParams',
TokenValidationFailedInvalidDigest : 'TokenValidationFailedInvalidDigest',
TokenValidationFailedIncorrectSHA : 'TokenValidationFailedIncorrectSHA',
// Application lifecycle events
AppStarted : 'AppStarted',
AppShutdownRequested : 'AppShutdownRequested',
// Test event
NoOp : 'NoOp'
});
// console.log(Events.TokenVal --> auto-complete
Screenshot of code-completion with Events:
See examples of Event definitions:
- C#: ApplicationEvents.cs
- Node.js: appEvents.js
It is very useful to be able to search for similar events across multiple services, especially in large, distributed systems with multiple teams and services. Searching for *FailedToConnect*
in a log search tool to find all service connection failures is a powerful observability technique.
However, avoid the temptation to create a single, cross-team library containing all possible events; this introduces coupling between services that introduce blocking dependencies between teams. Instead, use service-scoped (or team-scoped) event names. For example, the Payments team may have this set of events defined:
// PaymentsService events
const Events = Object.freeze({
PaymentsUndefinedError : 'PaymentsUndefinedError',
PaymentsFailedToConnectToDatabase : 'PaymentsFailedToConnectToDatabase',
PaymentsUnexpectedTokenInParseStream : 'PaymentsUnexpectedTokenInParseStream',
});
// console.log(Events.PaymentsFai --> auto-complete
The License team may have this set of events defined:
// LicenseService events
const Events = Object.freeze({
LicenseUndefinedError : 'LicenseUndefinedError',
LicenseFailedToConnectToDatabase : 'LicenseFailedToConnectToDatabase',
LicenseUnexpectedTokenInParseStream : 'LicenseUnexpectedTokenInParseStream',
});
// console.log(Events.LicenseFai --> auto-complete
We can still search for *UnexpectedToken*
events across services when necessary, but without the need for a shared library dependency.
- C#: CSharp-example.cs
- Node.js: NodeJS-example.js