netdata

New to Netdata? Here is a live demo: http://my-netdata.io

Netdata is a system for distributed real-time performance and health monitoring.

It provides unparalleled insights, in real-time, of everything happening on the systems it runs (including containers and applications such as web and database servers), using modern interactive web dashboards.

Netdata is fast and efficient, designed to permanently run on all systems (physical & virtual servers, containers, IoT devices), without disrupting their core function.

Netdata currently runs on Linux, FreeBSD, and MacOS.

Why use Netdata?

Netdata is a monitoring agent you install on all your systems.

It is:

a metrics collector - for system and application metrics (including web servers, databases, containers, etc)
a time-series database - all stored in memory (does not touch the disks while it runs)
a metrics visualizer - super fast, interactive, modern, optimized for anomaly detection
an alarms notification engine - an advanced watchdog for detecting performance and availability issues

All packaged together in a very flexible, extremely modular, distributed application.

This is how netdata compares to other monitoring solutions:

netdata	others (open-source and commercial)
High resolution metrics (1s granularity)	Low resolution metrics (10s granularity at best)
Monitors everything, thousands of metrics per node	Monitor just a few metrics
UI is super fast, optimized for anomaly detection	UI is good for just an abstract view
Meaningful presentation for all metrics (educational)	You have to know the metrics before you start
Install and get results immediately	A long preparation is required to get any useful results
Use it to troubleshooting performance problems	Use them to get statistics of past performance
Kills the console for tracing performance issues	The console is required for troubleshooting
Requires zero dedicated resources	Require dedicated resources

Netdata is free, super fast, very easy, completely open, flexible and integrate-able. It has been designed by SysAdmins, DevOps and Developers for troubleshooting performance problems, not just visualizing metrics.

Quick Start

WARNING:
People get adicted to netdata!
Once you install it and use it for a few minutes, there is no going back! You have been warned...

You can quickly install netdata on a Linux server with the following:

# make sure you run `bash` for your shell
bash

# install netdata, directly from github source
bash <(curl -Ss https://my-netdata.io/kickstart.sh)

More installation methods can be found at the installation page.

User base

Docker pulls

Since May 16th 2016 (the date the global public netdata registry was released):

in the last 24 hours:

News

Nov 6th, 2018 - netdata v1.11.0 released!

New query engine, supporting statistical functions.
Fixed security issues identified by Red4Sec.com and Synacktiv.
New Data Collection Modules: rethinkdbs, proxysql, litespeed, uwsgi, unbound, powerdns, dockerd, puppet, logind, adaptec_raid, megacli, spigotmc, boinc, w1sensor, monit, linux_power_supplies.
Improved Data Collection Modules: statsd.plugin, apps.plugin, freeipmi.plugin, proc.plugin, diskspace.plugin, freebsd.plugin, python.d.plugin, web_log, nginx_plus, ipfs, fail2ban, ceph, elasticsearch, nginx_plus, redis,
beanstalk, mysql, varnish, couchdb, phpfpm, apache, icecast, mongodb, postgress, elasticsearch, mdstat, openvpn_log, snmp, nut.
Added alarms for detecting abnormally high load average, TCP SYN and TCP accept queue overflows, network interfaces congestion and alarms for bcache, mdstat, apcupsd, mysql.
system alarms are now enabled on FreeBSD.
New notification methods: rocket.chat, Microsoft Teams, syslog, fleep.io, Amazon SNS.
and dozens more improvements, enhancements, features and compatibility fixes

Sep 18, 2018 - netdata has its own organization

Netdata used to be a firehol.org project, accessible as firehol/netdata.

Netdata now has its own github organization netdata, so all github URLs are now netdata/netdata. The old github URLs, repo clones, forks, etc redirect automatically to the new repo.

Jun 16, 2018 - netdata in CNCF

Netdata is now at the Cloud Native Computing Foundation (CNCF) landscape.

Read the netdata presentation we gave at CNCF TOC on Sep 18, 2018.

netdata infographic

This is a high level overview of netdata feature set and architecture.
Click it to to interact with it (it has direct links to documentation).

Features

Stunning interactive bootstrap dashboards

mouse and touch friendly, in 2 themes: dark, light
Amazingly fast

responds to all queries in less than 0.5 ms per metric,
even on low-end hardware
Highly efficient

collects thousands of metrics per server per second,
with just 1% CPU utilization of a single core, a few MB of RAM and no disk I/O at all
Sophisticated alerting

hundreds of alarms, out of the box!

supports dynamic thresholds, hysteresis, alarm templates,
multiple role-based notification methods (such as email, slack.com, flock.com,
pushover.net, pushbullet.com, telegram.org, twilio.com, messagebird.com, kavenegar.com)
Extensible

you can monitor anything you can get a metric for,
using its Plugin API (anything can be a netdata plugin,
BASH, python, perl, node.js, java, Go, ruby, etc)
Embeddable

it can run anywhere a Linux kernel runs (even IoT)
and its charts can be embedded on your web pages too
Customizable

custom dashboards can be built using simple HTML (no javascript necessary)
Zero configuration

auto-detects everything, it can collect up to 5000 metrics
per server out of the box
Zero dependencies

it is even its own web server, for its static web files and its web API
Zero maintenance

you just run it, it does the rest
scales to infinity

requiring minimal central resources
several operating modes

autonomous host monitoring, headless data collector, forwarding proxy, store and forward proxy, central multi-host monitoring, in all possible configurations.
Each node may have different metrics retention policy and run with or without health monitoring.
time-series back-ends supported

can archive its metrics on graphite, opentsdb, prometheus, json document DBs, in the same or lower detail
(lower: to prevent it from congesting these servers due to the amount of data collected)

What does it monitor?

netdata collects several thousands of metrics per device.
All these metrics are collected and visualized in real-time.

Almost all metrics are auto-detected, without any configuration.

This is a list of what it currently monitors:

CPU

usage, interrupts, softirqs, frequency, total and per core, CPU states
Memory

RAM, swap and kernel memory usage, KSM (Kernel Samepage Merging), NUMA
Disks

per disk: I/O, operations, backlog, utilization, space, software RAID (md)
Network interfaces

per interface: bandwidth, packets, errors, drops
IPv4 networking

bandwidth, packets, errors, fragments,
tcp: connections, packets, errors, handshake,
udp: packets, errors,
broadcast: bandwidth, packets,
multicast: bandwidth, packets
IPv6 networking

bandwidth, packets, errors, fragments, ECT,
udp: packets, errors,
udplite: packets, errors,
broadcast: bandwidth,
multicast: bandwidth, packets,
icmp: messages, errors, echos, router, neighbor, MLDv2, group membership,
break down by type
Interprocess Communication - IPC

such as semaphores and semaphores arrays
netfilter / iptables Linux firewall

connections, connection tracker events, errors
Linux DDoS protection

SYNPROXY metrics
fping latencies

for any number of hosts, showing latency, packets and packet loss
Processes

running, blocked, forks, active
Entropy

random numbers pool, using in cryptography
NFS file servers and clients

NFS v2, v3, v4: I/O, cache, read ahead, RPC calls
Network QoS

the only tool that visualizes network tc classes in realtime
Linux Control Groups

containers: systemd, lxc, docker
Applications

by grouping the process tree and reporting CPU, memory, disk reads,
disk writes, swap, threads, pipes, sockets - per group
Users and User Groups resource usage

by summarizing the process tree per user and group,
reporting: CPU, memory, disk reads, disk writes, swap, threads, pipes, sockets
Apache and lighttpd web servers

mod-status (v2.2, v2.4) and cache log statistics, for multiple servers
Nginx web servers

stub-status, for multiple servers
Tomcat

accesses, threads, free memory, volume
web server log files

extracting in real-time, web server performance metrics and applying several health checks
mySQL databases

multiple servers, each showing: bandwidth, queries/s, handlers, locks, issues,
tmp operations, connections, binlog metrics, threads, innodb metrics, and more
Postgres databases

multiple servers, each showing: per database statistics (connections, tuples
read - written - returned, transactions, locks), backend processes, indexes,
tables, write ahead, background writer and more
Redis databases

multiple servers, each showing: operations, hit rate, memory, keys, clients, slaves
couchdb

reads/writes, request methods, status codes, tasks, replication, per-db, etc
mongodb

operations, clients, transactions, cursors, connections, asserts, locks, etc
memcached databases

multiple servers, each showing: bandwidth, connections, items
elasticsearch

search and index performance, latency, timings, cluster statistics, threads statistics, etc
ISC Bind name servers

multiple servers, each showing: clients, requests, queries, updates, failures and several per view metrics
NSD name servers

queries, zones, protocols, query types, transfers, etc.
PowerDNS

queries, answers, cache, latency, etc.
Postfix email servers

message queue (entries, size)
exim email servers

message queue (emails queued)
Dovecot POP3/IMAP servers
ISC dhcpd

pools utilization, leases, etc.
IPFS

bandwidth, peers
Squid proxy servers

multiple servers, each showing: clients bandwidth and requests, servers bandwidth and requests
HAproxy

bandwidth, sessions, backends, etc
varnish

threads, sessions, hits, objects, backends, etc
OpenVPN

status per tunnel
Hardware sensors

lm_sensors and IPMI: temperature, voltage, fans, power, humidity
NUT and APC UPSes

load, charge, battery voltage, temperature, utility metrics, output metrics
PHP-FPM

multiple instances, each reporting connections, requests, performance
hddtemp

disk temperatures
smartd

disk S.M.A.R.T. values
SNMP devices

can be monitored too (although you will need to configure these)
chrony

frequencies, offsets, delays, etc.
beanstalkd

global and per tube monitoring
statsd

netdata is a fully featured statsd server
ceph

OSD usage, Pool usage, number of objects, etc.