criblpacks/cribl-palo-alto-networks

date/time/hostname parsing issue

FusionFC opened this issue · 2 comments

The eval in the various pipelines use

_raw.match(/[A-Z][a-z]{2}\s\d+\s\d{2}:\d{2}:\d{2}\s([^\s]+)\s/)[1]

for finding the hostname. This will fail for days 1-9 of a month as there will be two spaces after the month name.

Nov  2 08:15:57 HOSTNAME

vs 10 days from then

Nov 12 08:15:57 HOSTNAME

Ok, I was wrong, that exists in the TRAFFIC pipeline, but looking at the THREAT pipeline it is very different...

_raw.match(/[A-Z][a-z]{2}\s{1,2}\d{1,2}\s\d{2}:\d{2}:\d{2}\s([^\s]+)\s/)[1] || host

That should in fact handle it better. I guess I need to check all the pipelines first.

SYSTEM

_raw.match(/[A-Z][a-z]{2}\s\d+\s\d{2}:\d{2}:\d{2}\s([^\s]+)\s/)[1] || host

Ok...three pipelines three different things in the eval. That was somewhat (bad) luck of the draw, because I checked the remaining pipelines (CONFIG, DECRYPTION, GLOBALPROTECT, HIPMATCH, USERID) and they all matched TRAFFIC.

I think the || host is probably a good add, and the regex on traffic should parse the issue correctly although for non standard 2 spaces with 2 digits as well. I don't think the job of the pipeline to enforce the RFC and throw out "bad" data.