o/~ I'm a lumberjack and I'm ok! I sleep when idle, then I ship logs all day! I parse your logs, I eat the JVM agent for lunch! o/~
If you have questions and cannot find answers, please join the #logstash irc channel on freenode irc or ask on the logstash-users@googlegroups.com mailing list.
A tool to collect logs locally in preparation for processing elsewhere!
Problem: logstash jar releases are too fat for constrained systems.
Solution: lumberjack
- Minimize resource usage where possible (CPU, memory, network).
- Secure transmission of logs.
- Configurable event data.
- Easy to deploy with minimal moving parts.
- Simple inputs only:
- Follows files and respects rename/truncation conditions.
- Accepts
STDIN
, useful for things likevarnishlog | lumberjack...
.
Lumberjack is designed to guarantee that every event will be sent. To do this, it can sometimes send a event repeatedly. The reason for this is lumberjack spools a few hundred events up before sending them off - if that full spool is not acknowledged, it is resent. If logstash receives 1000 events in a lumberjack payload, and processes 500 of them before lumberjack think there's a timeout, lumberjack will reconnect and resend the full 1000, giving you the first 500 duplicated.
If you want to avoid duplicates, you can try setting '--window-size 1' on lumberjack so that each payload will only contain 1 event and lumberjack will wait for that event to be acknowledged before it sends another. You can still get duplicates in this situation, but the number of duplicates will be much reduced.
Another way to prevent duplicate events is by setting a 'document_id' in the elasticsearch output. Done carefully, this causes duplicate events to overwrite themselves in elasticsearch instead of creating new items in elasticsearch.
-
Install FPM
sudo gem install fpm
-
Ensure you have outgoing FTP access to download OpenSSL from
ftp.openssl.org
. -
Compile lumberjack
git clone git://github.com/jordansissel/lumberjack.git cd lumberjack make
-
Make packages, either:
make rpm
Or:
make deb
Packages install to /opt/lumberjack
. Lumberjack builds all necessary
dependencies itself, so there should be no run-time dependencies you
need.
Generally:
lumberjack.sh --host somehost --port 12345 /var/log/messages
See lumberjack.sh --help
for all the flags
- You'll need an SSL CA to verify the server (host) with.
- You can specify custom fields with the
--field foo=bar
. Any number of these may be specified. I use them to set fields liketype
and other custom attributes relevant to each log. - Any non-flag argument after is considered a file path. You can watch any number of files.
In logstash, you'll want to use the lumberjack input, something like:
input {
lumberjack {
# The port to listen on
port => 12345
# The paths to your ssl cert and key
ssl_certificate => "path/to/ssl.crt"
ssl_key => "path/to/ssl.key"
# Set this to whatever you want.
type => "somelogs"
}
}
Below is valid as of 2012/09/19
- Sets small resource limits (memory, open files) on start up based on the number of files being watched.
- CPU: sleeps when there is nothing to do.
- Network/CPU: sleeps if there is a network failure.
- Network: uses zlib for compression.
- Uses OpenSSL to verify the server certificates (so you know who you are sending to).
- Uses OpenSSL to transport logs.
- The protocol lumberjack uses supports sending a
string:string
map. - The lumberjack tool lets you specify arbitrary extra data with
--field name=value
.
- All dependencies are built at compile-time (OpenSSL, jemalloc, etc) because many os distributions lack these dependencies.
- The
make deb
ormake rpm
commands will package everything into a single DEB or RPM. - The
bin/lumberjack.sh
script makes sure the dependencies are found when run in production.
- Re-evaluate globs periodically to look for new log files.
- Track position of in the log.
I would love to not have a custom protocol, but nothing I've found implements what I need, which is: encrypted, trusted, compressed, latency-resilient, and reliable transport of events.
- Redis development refuses to accept encryption support, would likely reject compression as well.
- ZeroMQ lacks authentication, encryption, and compression.
- Thrift also lacks authentication, encryption, and compression, and also is an RPC framework, not a streaming system.
- Websockets don't do authentication or compression, but support encrypted channels with SSL. Websockets also require XORing the entire payload of all messages - wasted energy.
- SPDY is still changing too frequently and is also RPC. Streaming requires custom framing.
- HTTP is RPC and very high overhead for small events (uncompressable headers, etc). Streaming requires custom framing.
See LICENSE file.