Overview
file.d
is a blazing fast tool for building data pipelines: read, process, and output events. Primarily developed to read from files, but also supports numerous input/action/output plugins.
⚠ Although we use it in production,it still isn't v1.0.0
. Please, test your pipelines carefully on dev/stage environments.
Motivation
Well, we already have several similar tools: vector, filebeat, logstash, fluend-d, fluent-bit, etc.
Performance tests state that best ones achieve a throughput of roughly 100MB/sec. Guys, it's 2020 now. HDDs and NICs can handle the throughput of a few GB/sec and CPUs processes dozens of GB/sec. Are you sure 100MB/sec is what we deserve? Are you sure it is fast?
Main features
- Fast: more than 10x faster compared to similar tools
- Predictable: it uses pooling, so memory consumption is limited
- Reliable: doesn't lose data due to commitment mechanism
- Container / cloud / kubernetes native
- Simply configurable with YAML
- Prometheus-friendly: transform your events into metrics on any pipeline stage
- Well-tested and used in production to collect logs from Kubernetes cluster with 3000+ total CPU cores
Performance
On MacBook Pro 2017 with two physical cores file.d
can achieve the following throughput:
- 1.7GB/s in
files > devnull
case - 1.0GB/s in
files > json decode > devnull
case
TBD: throughput on production servers.
Plugins
Input: dmesg, fake, file, http, journalctl, k8s, kafka
Action: add_host, convert_date, debug, discard, flatten, join, json_decode, keep_fields, modify, parse_es, remove_fields, rename, throttle
Output: devnull, elasticsearch, gelf, kafka, stdout
What's next
Generated using insane-doc