facebookarchive/hblog

Handle bursty logs

Opened this issue · 2 comments

I have a monitoring system that runs hblog every minute. If the hblog command is taking more than 40 seconds the monitoring tool will timeout and not report any data. Unfortunately, during a burst is when the most interesting events happened and my monitoring system missed it.

On a single host in a logfile that normally get 1,000 lines per minute (hblog handles that and bursts of 10,000 lines). However, there was a burst of 750,000 log lines in a 30 second interval - that caused hblog to go passed the 40 second timeout. Unfortunately the most interesting/important log information for the day was in those 30 seconds.

I'd like to enhance hlog server to be able to take a timeout parameter from the client and still report on whatever log summary it was able to extract. If the timeout happens a special status field will tell the client that a timeout occurred. However, the client (such as my monitoring system) will still be able to act on the partial data.

During this type of timeout, the user CLI client an warning message should be printed saying that not all log lines were processed. Also, the client should not blacklist the host for --follow because the reason for the timeout is a bust in the logfile and not a host/network issue.

Thank you for reporting this issue and appreciate your patience. We've notified the core team for an update on this issue. We're looking for a response within the next 30 days or the issue may be closed.

This still needs to be fixed.