tls_logger backoff
getvictor opened this issue · 1 comments
Feature request
What new feature do you want?
When TLS logging endpoint is down or having issues, I would like osquery to automatically backoff from sending more logs.
How is this new feature useful?
If the endpoint server goes down for some time, it might not be able to handle the increased log activity (due to the logging backlog), and go down again. The backoff will give server time to recover.
How can this be implemented?
- Add
--logger_tls_backoff=true
switch. - With the above switch, assuming
--logger_tls_period=3
and unsuccessful requests, the next request will happen in 3^1=3 seconds, the next request will happen in 3^2=9 seconds, the next request in 3^3=27 seconds, and so forth until a fixed maximum. - The fixed maximum will be 3 hours, but this is up for discussion. The maximum can also be a switch.
- If the user wants to force restart the logs, they can write
--logger_tls_backoff=false
This was discussed in office hours and there was general agreement about proceeding.
Some things that were brought up:
-
Maybe instead of a boolean flag this can be
logger_tls_backoff_max
where0
would be "off" and a positive value would turn it on with a configurable maximum? This came from concern that @Smjert had that some users might want to see lower than 3 hour maximum. -
Should there be coverage of other tls endpoints (eg. distributed read/write, config)?