wallix/pylogsparser

Apache Normalizer

gvasold opened this issue · 2 comments

first of all: I like the idea of pylogsparsers and started playing with it, but got stuck with the apache normalizer.
Taking a closer look I found something strange:
in the apache.xml file you give as example:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" Mozilla/4.08 [en](Win98; I ;Nav)

But calling normalize() on this example log did not change the dictionary. I suppose that no suitable normalizer was found.
(BTW: I think it would be useful to get a warning or exception in this case).

Then I looked at the unit tests and found these log line in test_log_samples.py:

Oct 22 01:27:16 pluto apache: 127.0.0.1 - - [20/Jul/2009:00:29:39 +0300] "GET /index/helper/test HTTP/1.1" 200 889
Oct 22 01:27:16 pluto apache: 10.10.4.4 - - [04/Dec/2009:16:23:13 +0100] "GET /tulipe.core.persistent.persistent-module.html HTTP/1.1" 200 2937 "http://10.10.4.86/toc.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090910 Ubuntu/9.04 (jaunty) Shiretoko/3.5.3"

When using these log lines in my code the normalizer worked. But this is no default apache log format but something syslog would write if apache is configured to use syslog. I think this is confusing because the description of apache.xml says "This parser supports log formats defined in apache's documentation". So maybe you should adjust documentation and example in apache.xml or add support for apache standalone log formats to the normalizer.

Hi gvasold,

Thanks for using pylogsparser.

In the apache normalizer you can see the appliedTo attribute that indicate to pylogsparser to evaluate the body key
of the dictionary. If you pass a dictionary with only the raw key set to a standard log line from apache access.log
the normalizer won't match. You must add a body key set with the same value as raw key to the dictionary
you will pass to normalize().

I have updated the apache log lines in test_log_samples.py test, these logs was just unusual. Look at the aS
method I have update it to set the body key.

The appliedTo attribute from apache.xml is not set to raw in order to handle special case where access log are redirected
to syslog. In that case syslog normalizer handles the syslog header and put the rest of the log line to the body key then
apache normalizer will match on body and update dictionary as you expect.

Hi morucci,

thanks for pushing me in the right direction. Now it works like a charm!