LogJuicer Extracts Anomalies From Log Files

Based on baseline logs, LogJuicer highlights useful texts in target logs. The goal is to save time in finding failures' root causes.

How it works

LogJuicer implements a custom diffing process to compare logs:

A tokenizer removes random words.
Lines are converted into feature vectors using the hashing trick.
The logs are compared using cosine similarity.

LogJuicer features a discovery mechanism to automatically find the source of the diff for some targets (called baselines):

A service.log file will be compared with the last service.log-YYYYDDMM.
CI builds baselines may be found through the external service API.

When the baseline discovery fails, the diff's source must be provided.

Install

Install the logjuicer command line by running:

$ cargo install --locked --git https://github.com/logjuicer/logjuicer logjuicer-cli

If you don't have cargo, see this install rust documentation.

Or grab the latest release assets logjuicer-x86_64-linux.tar.bz2 from https://github.com/logjuicer/logjuicer/releases

Use

Analyze a local file:

$ logjuicer path /var/log/zuul/scheduler.log

Analyze a remote url:

$ logjuicer url https://zuul/build/uuid

Compare two inputs (when baseline discovery doesn't work):

$ logjuicer diff https://zuul/build/success-build https://zuul/build/failed-build

Save and re-use trained model using the --model file-path argument.

Report

LogJuicer can create a static report for archival purpose using the --report argument:

.bin or .gz files are created along with a .html viewer to be displayed in a web browser. Add the --open argument to load the report with xdg-open.
.json are regular json export.

For example, run the following command to visualize the differences between two directories:

$ logjuicer --open --report report-case-01.bin.gz diff sosreport-success/ sosreport-failled/

Configure

LogJuicer supports the ant's fileset configuration to filter the processed files:

includes: list of files regex that must be included. Defaults to all files.
excludes: list of files regex that must be excluded. Defaults to default excludes or none if default_excludes is false.
default_excludes: indicates whether default excludes should be used or not.

LogJuicer supports custom ignore patterns to silence noisy lines:

ignore_patterns: list of log line regex to be ignored.

Adds custom extra baseline, for example to include files that are skipped in success build artifacts:

extra_baselines: list of file path or remote urls.

The configuration can be defined per target, for example:

- job_matcher: "^my-job[0-9]+$"
  config:
    excludes: [big_file]
    ignore_patterns:
      - get logs
      - fetch debug

Use the logjuicer debug-config JOB FILE LINE to validate the ignore_patterns config.

Learn

To read more about the project:

Initial presentation blog post
The command line specification: ./doc/adr/0001-architecture-cli.md
How the tokenizer works: Improving LogJuicer Tokenizer
How the nearest neighbor works: Implementing LogJuicer Nearest Neighbors
How the log file iterator works: Introducing the BytesLines Iterator
Completing the first release of LogJuicer
How the web interface works: WASM based web interface
The report file format: Leveraging Cap'n Proto For LogJuicer Reports

Contribute

Clone the project and run tests:

git clone https://github.com/logjuicer/logjuicer && cd logjuicer
cargo test && cargo fmt && cargo clippy

Run the project:

cargo run -p logjuicer-cli -- --help

Activate tracing debug:

export LOGJUICER_LOG="logjuicer_model=debug,logjuicer_cli=debug"
# Create a chrome trace that can be viewed in web browser with `chrome://tracing`
export LOGJUICER_TRACE=./chrome.trace

Checkout the web crate to develop the web interface.

Join the project Matrix room: #logjuicer:matrix.org.

Roadmap

Detect jenkins url
Reports minification
Web service deployment