datalust/seq-input-syslog

Structured data extraction with regular expressions

nblumhardt opened this issue · 1 comments

Syslog messages frequently contain interesting property data encoded into the message text.

For example, a message (ignoring syslog payload details) might prefix events with the user who performed some action:

[user: 15] deleted file 'example.csv'

There are a variety of ad-hoc encodings that attach structured data to events this way. Since Seq has richer structured data support, a simple way to extract these elements into property values would be useful.

One option is to accept a list of regular expressions in the configuration for an instance of Seq.Input.Syslog:

^[user: (?<userId>\d+)]
^[service: (?<serviceId>\d+)] \((?<threadId>\d+)\)
(?<endpoint>http[s]?://[\w\-\.\/]+)

When a textual syslog message is accepted by the input, each regular expression would be matched against the message text in turn, and on finding the first instance that successfully matched any part of the message, the values of its named capture groups would be extracted and added to the message, skipping any further expressions.

E.g. for the example message, the property userId = 15 would be extracted and attached to the resulting event.

If the regex is not anchored with ^ and/or $, then the first match anywhere in the message would be considered a hit; the final example in the list of expressions would extract an endpoint property for the first URL appearing in any message.

The message could additionally be converted into a message template by escaping { and }, and replacing the capture with the property name:

[user: {userId}] deleted file 'example.csv'

This would produce more pleasing event rendering in Seq, but nested capture groups would need to be detected and only the outermost one converted into a template hole.