/petit3

Log analysis program for use from the command line

Primary LanguageGLSLOtherNOASSERTION

petit3

Python

Python3 port of the classic petit utility for inspecting log files.

❗ Disclaimer:

Note that this tool is not based exact science, it employs various heuristics to make a large log more approachable for system administrators and investigators. However, in case of doubt, a precise analysis cannot be avoided by employing this tool.

Installation

Use Python’s package manager to install the tool:

pip3 install .

Afterwards, you will find the main script petit3 in /home/$USER/.local/bin/. If this directory is in you $PATH, just run:

petit3 -V

Usage

“Hash” a syslog, removing reboots and all standard filters. “Hashing” means to replace variable components of log lines with a #-char and counting the occurence of each “normalized” entry. By default petit will show a sample for all entries which are found three or less times.

petit3 --hash --fingerprint /var/log/messages

“Hash” an Apache log, which means to replace variable components of log lines with a #-char and counting the occurence of each “normalized” entry:

petit3 --hash /var/log/httpd/access_log

Get a report of all daemons found in log file:

petit3 --daemon /var/log/messages

Get a host report:

petit3 --host /var/log/messages

Find qualitatively important words in your log. This is especially useful to help determine what should be monitored in swatch:

petit3 --wordcount /var/log/messages

Graph last 60 seconds starting from now or the last entry in a syslog:

petit3 --sgraph /var/log/messages --end last

Track a special word you are interested in by minute starting from now or the last entry in a syslog:

cat /var/log/messages | grep error | petit3 --mgraph

Show samples for each entry:

petit3 --hash --allsample /var/log/messages

ℹ️ Background

Motivation

Log analysis is something that all systems administrators know they need to do. Many of us come to this point, either because there is a problem, there is a security requirement from the organization, or it keeps you up all night wanting to know what is going on in all of that data.

Looking for best practices for log analysis on this Internet is difficult at best. Many years ago, Scott McCarty discovered a script that hashed log files by removing all of their numbers and replacing them with “#” characters. The results of this simple algorithm were phenomenal, logs could be reduced by a factor of ten. This was much more readable, yet left much of the quality data that Scott McCarty needed to determine if there was a problem.

In the years since Scott McCarty discovered that simple algorithm, Scott McCarty have come to discover many techniques on text analysis which are commonly used in linguistics and anthropology to analyze natural languages. This has led me to develop very simple best practices for analyzing logs.

The Basics

  1. Logs are made up of output which are programmed by human beings. There are no real restraints on what is output, other than, some cultural rules on being professional. This makes the output from programs very much a natural language. This also makes the output of someones program an approximation of the reality of what is happening inside a program. This is important to remember, logs are not perfect.
  2. When a systems administrator analyzes logs by changing them, he is creating an approximation of an approximation of reality in side a working program. This is not necessarily a bad thing, especially, when the programmer never gives you better than his approximation of reality anyway.
  3. In practice logs are made up of certainty and uncertainty. For example, Scott McCarty knows what OpenSSH puts in the log during a login, because it is common. On the other hand, Scott McCarty does not now what a Compaq DL380 G3 will put in the log when it has a disk controller error. This is important to remember.
  4. The basic log analysis algorithm in Petit works to remove certainty, while leaving uncertainty. Stated another way, Petit quantitatively removes certainty, thereby leaving uncertainty, which by necessity requires qualitative analysis from a systems administrator
  5. After the algorithm has been applied, the output must be read by a systems administrator to determine if it is a normal or abnormal. Then abnormal entries can be acted on, hopefully before there is noticeable impact to your system.

Special Operations

Create an on the fly driver for a nonstandard file format, then pipe it to Petit. Petit can hash files of non-standard types ok, but graphing requires the time values to be in the correct columns.

cat /var/log/httpd/error_log | awk '{$1="";$5="";print}' | petit3 --sgraph

♻️ Changes due to or in the course of porting

This is just a port to Python3 with some minor changes. It is neither a rewrite nor an upgrade, just a port with a bit of restructuring and code style updates.

Please note that there are some slight deviations compared to the original implementation

  • precedence of directories to place filter and fingerprint files (now /var/lib/petit3/ takes precedence, otherwise the files shipped with the package will be used.)
  • fixed a bug in --wordcount
  • different rounding of floats in graphing
  • changes of graphing
    • introduce CLI-parameter --end to specify either to use current point in time or last entry of log file
    • does not graph “into the future” anymore

Those subtle changes should make petit at least as or even more helpful for any system administrator or first responder.

Contribute

All contributions are welcome, for feature requests or bug reports, use Github Issues. Pull requests are welcome to help fix or add features.

Code contributions

This repository uses the black code style. To automatically format the code, it is recommended to setup your editor to auto format on save.

License

GNU Lesser General Public License Version 3 (LGPL 3), see COPYING for details.