/dispatch-log

Primary LanguageMakefile

dispatch-log

Previous setup

Our previous centralized log for apache vhosts used logger(1)

From apache conf:

  CustomLog "|/usr/bin/logger -t www_vhost_tld_access -p local6.info" combined
  ErrorLog  "|/usr/bin/logger -t www_vhost_tld_error -p local6.info"

The php logs were not centralized.

New setup on the rsyslog source nodes

The new centralized log setup use the rsyslog imfile module to convert our standard text apache and php log file (and possibly others) into syslog messages.

Apache access rsyslog.conf setup (Similar setup exist for apache and php error):

  module(load="imfile")
  input (type="imfile"
  File="/space/log2/*access.log"
  Tag="apache-access"
  stateFile="apache-access"
  Severity="info"
  Facility="local7"
  addMetadata="on")

Then a specific rsyslog.conf setup allow to add the originate file name in syslog record:

  $template meta,"%TIMESTAMP% %HOSTNAME% %APP-NAME% %$!metadata!filename% %MSG%\n
  *.* @@127.0.0.1:514;meta

The imfile module can't wildcard folder. That means the many dispersed log files must be first hard linked to some centralized place. This is done via simple script invoked via logrorate. After apache reload, all found log files are linked to a unique folder. To ensure file name uniqueness, target name is a concatenation of last folder path component (vhost name) and filename.

Here is our gatherer script

New setup on the syslog-ng central node

syslog-ng is used and all vhost log files collected are received as a unique log stream after beeing tag splitted (apache-access, apache-error, php-error).

To dispatch again log files by vhost we use a specific syslog-ng setup which add all needed information to syslog record allowing a dispatch script to dynamically recreate original vhost files.

A typical syslog-ng setup looks like:

destination d_dispatch {
  program("/usr/local/bin/dispatch-log"
  template("/space/remote_logs/$HOST/$YEAR/$MONTH/$DAY/${PROGRAM}.d $MSG\n")); };
filter f_host_front { host("front1") or host("front2"); };
filter f_program_apache_access { program("apache-access"); };
log { source(src_lan); filter(f_host_front); filter(f_program_apache_access); destination(d_dispatch); };

As the src node already added the vhost file name as first syslog record the script only need to construct a destination path from first and second field of syslog record by wrapping a simple awk script:

BEGIN { path = "^/([[:alnum:]_.-]+/?)+$" }
($1 !~ path) || ($2 !~ path) { exit(2) }
NR == 1 || previous != $1 { system("mkdir -p " $1); previous = $1 }
{
  cmd = substr($0, index($0, $3));
  file = $1 substr($2, length(prefix) + 1);
  if (erase) print cmd > file; else print cmd >> file; fflush()
}

Where prefix is passed via the (Makefile) wrapper and is the base path of gathered remote log files.

Notes

The script ensure requested folder is created when path change from one line to the next.

This is convenient but may request to keep one process by base folder to avoid unnecessary fork(2)/exec(3) for mkdir -p

Here are the choices we face:

  • Keep a single dispatch process using muliple filter for a unique log entry in syslog-ng conf and don't care for possible ressources waste till it began to hurt.

  • Use as many dispatch process as needed to limit mkdir -p call to day change by using one log entry for each node, channel combination

  • Make a new dispatch script to stat(2) before write(2), but that probably require perl or python instead of awk.