/log-parser

PHP Web Server Log Parser Library

Primary LanguagePHPApache License 2.0Apache-2.0

Web server access Log Parser

Build Status

Install

Using composer:

php composer.phar require kassner/log-parser:~1.0

Usage

Simply instantiate the class :

$parser = new \Kassner\LogParser\LogParser();

And then parse the lines of your access log file :

$lines = file('/var/log/apache2/access.log', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach ($lines as $line) {
    $entry = $parser->parse($line);
}

Where $entry object will hold all data parsed.

stdClass Object
(
    [host] => 193.191.216.76
    [logname] => -
    [user] => www-data
    [stamp] => 1390794676
    [time] => 27/Jan/2014:04:51:16 +0100
    [request] => GET /wp-content/uploads/2013/11/whatever.jpg HTTP/1.1
    [status] => 200
    [responseBytes] => 58678
)

You may customize the log format (by default it matches the Apache common log format)

# default Nginx format :
$parser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" \"%{User-Agent}i"');

Supported format strings

Here is the full list of log format strings supported by Apache, and whether they are supported by the library :

Supported? Format String Property name Description
Y %% percent The percent sign
Y %> status status
Y %A localIp Local IP-address
Y %a remoteIp Remote IP-address
N %B - Size of response in bytes, excluding HTTP headers.
Y %b responseBytes Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a '-' rather than a 0 when no bytes are sent.
Y %D - The time taken to serve the request, in microseconds.
N %f - Filename
Y %h host Remote host
N %H - The request protocol
Y %I receivedBytes Bytes received, including request and headers, cannot be zero. You need to enable mod_logio to use this.
N %k - Number of keepalive requests handled on this connection. Interesting if KeepAlive is being used, so that, for example, a '1' means the first keepalive request after the initial one, '2' the second, etc...; otherwise this is always 0 (Y indicating the initial request). Available in versions 2.2.11 and later.
Y %l logname Remote logname (from identd, if supplied). This will return a dash unless mod_ident is present and IdentityCheck is set On.
Y %m requestMethod The request method
Y %O sentBytes Bytes sent, including headers, cannot be zero. You need to enable mod_logio to use this.
Y %p port The canonical port of the server serving the request
N %P - The process ID of the child that serviced the request.
N %q - The query string (prepended with a ? if a query string exists, otherwise an empty string)
Y %r request First line of request
N %R - The handler generating the response (if any).
N %s - Status. For requests that got internally redirected, this is the status of the original request --- %>s for the last.
X %T requestTime The time taken to serve the request, in seconds. This option is not consistent, Apache won't inform the milisecond part.
Y %t time Time the request was received (standard english format)
Y %u user Remote user (from auth; may be bogus if return status (%s) is 401)
Y %U URL The URL path requested, not including any query string.
Y %v serverName The canonical ServerName of the server serving the request.
Y %V canonicalServerName The server name according to the UseCanonicalName setting.
N %X - Connection status when response is completed: X = connection aborted before the response completed. + = connection may be kept alive after the response is sent. - = connection will be closed after the response is sent.
N %{Foobar}C - The contents of cookie Foobar in the request sent to the server. Only version 0 cookies are fully supported.
N %{Foobar}e - The contents of the environment variable FOOBAR
Y %{Foobar}i *Header The contents of Foobar: header line(s) in the request sent to the server. Changes made by other modules (e.g. mod_headers) affect this. If you're interested in what the request header was prior to when most modules would have modified it, use mod_setenvif to copy the header into an internal environment variable and log that value with the %{VARNAME}e described above.
N %{Foobar}n - The contents of note Foobar from another module.
N %{Foobar}o - The contents of Foobar: header line(s) in the reply.
N %{format}p - The canonical port of the server serving the request or the server's actual port or the client's actual port. Valid formats are canonical, local, or remote.
N %{format}P - The process ID or thread id of the child that serviced the request. Valid formats are pid, tid, and hextid. hextid requires APR 1.2.0 or higher.
N %{format}t - The time, in the form given by format, which should be in strftime(3) format. (potentially localized) (This directive was %c in late versions of Apache 1.3, but this conflicted with the historical ssl %{var}c syntax.)

Beware: You should really read the notes when using a option that is marked with a X on the Supported? column.

Exceptions

If a line does not match with the defined format, an \Kassner\LogParser\FormatException will be thrown.