Monitoring
Monitoring module allows to inject user defined metrics and monitor the process itself. It supports multiple backends, protocols and data formats.
Table of contents
- Installation
- Getting started
- Features and additional information
- Code snippets
- System monitoring and server-side backends installation and configuration
Installation
aliBuild
Click here if you don't have aliBuild installed
- Compile
Monitoring
and its dependecies viaaliBuild
aliBuild init Monitoring@master
aliBuild build Monitoring --defaults o2-daq
- Load the enviroment for Monitoring (in the
alice
directory)
alienv load Monitoring/latest
In case of an issue with aliBuild
refer to the official instructions.
Manual
Manual installation of the O2 Monitoring module.
Requirements
- C++ compiler with C++14 support, eg.:
gcc-c++
package fromdevtoolset-6
on CentOS 7clang++
on Mac OS
- Boost >= 1.56
- libcurl
- ApMon (optional)
Monitoring module compilation
git clone https://github.com/AliceO2Group/Monitoring.git
cd Monitoring; mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<installdir>
make -j
make install
Getting started
Monitoring instance
The recommended way of getting (unique_ptr
to) monitoring instance is Get
ing it from MonitoringFactory
by passing backend URI(s) as a parameter (comma seperated if more than one).
The library is accessible from o2::monitoring
namespace.
#include <MonitoringFactory.h>
using namespace o2::monitoring;
std::unique_ptr<Monitoring> monitoring = MonitoringFactory::Get("backend[-protocol]://host:port[/verbosity][?query]");
See table below to find out how to create URI
for each backend:
Backend name | Transport | URI backend[-protocol] | URI query | Default verbosity |
---|---|---|---|---|
InfluxDB | HTTP | influxdb-http |
/write?db=<db> |
prod |
InfluxDB | UDP | influxdb-udp |
- | prod |
ApMon | UDP | apmon |
- | prod |
Local InfoLogger | - | infologger:// |
- | debug |
InfoLogger | TCP | infologger |
- | prod |
Flume | UDP | flume |
- | prod |
Multiple backends may be used at the same time, URLs should be separated by ,
(comma).
Sending metric
send(Metric&& metric, [DerivedMetricMode mode])
Where metric constructor receives following parameters:
T value
std::string& name
[time_point<system_clock> timestamp]
The DerivedMetricMode
is described in Calculating derived metrics section.
See how it works in the example: examples/1-Basic.cxx
Debug metrics
Debug metrics can be send by a similar method to above's send
:
debug(Metric&& metric)
The difference is that debug metrics are only passed to backends which verbosity level is set to debug
.
Each backend has its default verbosity (see backend in Monitoring instance section). This can be changed by defining path of a backend URL:
/prod
- onlysend
metrics are passed to the backend/debug
- all the metrics are passed to the backend
Customized metrics
Two additional methods can be chained the to send(Metric&& metric)
in order to insert custom tags or set custom timestamp:
addTags(std::vector<Tag>&& tags)
setTimestamp(std::chrono::time_point<std::chrono::system_clock>& timestamp)
See how it works in the example: examples/2-TaggedMetrics.cxx, examples/3-UserDefinedTimestamp.cxx.
Features and additional information
Sending more than one metric
In order to send more than one metric in a packet group them into vector:
monitoring->send(std::vector<Metric>&& metrics);
It's also possible to send multiple, grouped values (only Flume
and InfluxDB
backends are supported); For example cpu
metric can be composed of cpuUser
, cpuSystem
values.
void sendGroupped(std::string name, std::vector<Metric>&& metrics)
See how it works in the example: examples/8-Multiple.cxx
Buffering metrics
In order to avoid sending each metric separately, metrics can be temporary stored in the buffer and flushed at the most convenient moment. This feature can be operated with following two methods:
monitoring->enableBuffering(const unsigned int maxSize)
...
monitoring->flushBuffer();
enableBuffering
takes maximum buffer size as its parameter. The buffer gets full all values are flushed automatically.
See how it works in the example: examples/10-Buffering.cxx.
Metrics
Metrics consist of 4 parameters: name, value, timestamp and tags.
Parameter name | Type | Required | Default |
---|---|---|---|
name | string | yes | - |
value | int / double / string / uint64_t | yes | - |
timestamp | chrono::time_point<std::chrono::system_clock> | no | current timestamp |
tags | vector | no | -** |
**Default tag set is process specific and included in each metric:
- hostname
- PID
- process name
Calculating derived metrics
The module can calculate derived metrics. To do so, use optional DerivedMetricMode mode
parameter of send
method:
DerivedMetricMode::NONE
- no action,DerivedMetricMode::RATE
- rate between two following metrics,DerivedMetricMode::AVERAGE
- average value of all metrics stored in cache.
Derived metrics are generated each time as new value is passed to the module. Their names are suffixed with derived mode name.
See how it works in the example: examples/4-RateDerivedMetric.cxx.
Global tags
Glabal tags are tags that are added to each metric. The following tags are set to global by library itself:
hostname
name
- process name
You can add your own global tag by calling addGlobalTag(std::string name, std::string value)
.
Monitoring process
enableProcessMonitoring([interval in seconds]);
The following metrics are generated every interval:
- cpuUsedPercentage - percentage of a core usage over time interval
- involuntaryContextSwitches - involuntary context switches over time interval
- memoryUsagePercentage - ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage (Linux only)
Automatic metric updates
When global, higher level metrics are created it's necessary to provide values every interval of time (even though values does not change). This can be done using AutoPushMetric
. The
Metric& metric = monitoring->getAutoPushMetric("exampleMetric");
metric = 10;
See how it works in the example: examples/11-AutoUpdate.cxx.
System monitoring, server-side backends installation and configuration
This guide explains manual installation. For ansible
deployment see AliceO2Group/system-configuration gitlab repo.