PaloAltoNetworks/Splunk_TA_paloalto

Question: Why isn't "generated_time" used as _time?

Closed this issue · 6 comments

I've deployed Splunk and the Palo app a number of times and I've always used the timestamp when the event was generated at the dataplane. The sixth field in the default Palo log format. Occasionally, that timestamp is different from the managementplane timestamp. Is there a reason for not using the dataplane timestamp?

Thx,
JD

Hi JD,

Would using "generated_time" make a significant difference to you in your workflow or how you interpret events? If so, can you help us understand the difference?

Thanks,
-Brian

Brian -

Thx for the quite response. Depending on how sensitive you are to the accuracy of when the actual event occurred, I could see it being significant. Stitching events together in their proper order could be affected.

I could be wrong, but I've always made the assumption that the difference between the two timestamps is due to processing and network lag/issues. For example, if there was a five-minute interrupt between the fw and Pano, you'd have a five-minute difference between the dataplane and managementplane timestamps. Is that correct? Kind of like the difference between _time and _indextime.

There should never be a 5 minute difference. Less than a second difference in 99.9...% of cases, since the dataplane and management plane are always on the same device. If you were to see a significant discrepancy between these times then you have a much bigger problem than syslog timestamp accuracy. If you want you can modify props and transforms to use generated_time, but we won't change this in the Add-on. It would cause confusion and support tickets when timestamps don't match up, and the difference between the times is so minimal. We'll consider a document to instruct people how to switch to generated_time if they want, depending on demand for such a doc.

Thanks for opening this issue. I'll close it out at this point but let me know if you have further questions.

Since you will consider a document based on demand...we are also having this issue. The timestamp of the actual traffic is always going to be more relevant (especially for security monitoring and correlation) than the Panorama timestamp. We will be editing the props and transforms in order to use generated_time, and also create alerting to notify us when there is more than a 5 minute time difference between the dataplane and managementplane times. Any assistance would be appreciated.

In a commit from last year e03f4d8 which switches props.conf to use generated_time instead of receive_time, the change adds a portion of the src_ip to the milliseconds section of _time.

This occurs for both pan:traffic, pan:threat and probably others...
_time - _raw
1/23/19 3:04:58.195 PM - Jan 23 15:05:21 Panorama2 1,2019/01/23 15:05:21,001234567890,TRAFFIC,drop,2049,2019/01/23 15:04:58,195.161.41.50,
1/23/19 3:04:58.176 PM - Jan 23 15:05:21 Panorama2 1,2019/01/23 15:05:21,001234567890,TRAFFIC,drop,2049,2019/01/23 15:04:58,176.119.4.18,
1/23/19 3:04:58.164 PM - Jan 23 15:05:21 Panorama2 1,2019/01/23 15:05:21,001234567890,TRAFFIC,drop,2049,2019/01/23 15:04:58,164.52.24.162,

In the above examples... 195, 176, and 164 are all the first octet of the source IP address and added as milliseconds to _time.

One note is that my firewalls forward their logs to Palo Alto Panorama and that sends them to Splunk, but I don't believe that is the issue. Possibly part of them new TIME_PREFIX = ^(?:[^,]*,){6} added to each sourcetype in props.conf.

I'm very happy to see the switch to generated_time as Panorama causes the recieve_time for us to deviate by 20+ seconds which is bothersome when stitching and doing correlation with other logs.

I hope that is helpful and thank you for your time and assistance,
Scott

Hi Scott, you are completely correct. Unfortunately there isn’t much we can do to resolve it because future versions of PANOS may include milliseconds in the generated time. There’s no way to tell Splunk not to interpret the comma as part of the time stamp, but do include a period as part of the time stamp. Let me know if you find a way. We figured since the logs don’t include milliseconds, that having a random millisecond value applied wouldn’t be too big a deal because customers can safely ignore the milliseconds.

We are open to other possible solutions. This seemed like the least of several evils and won’t be an issue if/when the logs start to include milliseconds.

Sent with GitHawk