Never captures really short live jobs
tr3mor opened this issue · 3 comments
tr3mor commented
Hello,
In one of the latest releases the following part was changed:
scalyr-agent-2/scalyr_agent/builtin_monitors/kubernetes_monitor.py
Lines 2277 to 2295 in 1245df0
Following this update, any job completed in beetwen the Scalyr checks will not be included in the log collection. The API will return a 404 error for these pods, resulting in their exclusion from the log collection. This scenario is frequently observed for our Argo workflows.
Instead of always discarding such pods, I would suggest fallback to global config (env SCALYR_K8S_INCLUDE_ALL_CONTAINERS ) to determine if pod's logs should be collected or not.
I think that if you include all pods by default, you expect it as default behavior and vice versa.
If not, I would like to understand what is the better way to handle such cases (we already have container_check_interval set to 2sec).
weilliu commented
@tr3mor Apologies for the delayed response. The engineering confirms the issue and we'll implement a fix in the future agent version to address the problem.
weilliu commented
The fix will be deployed to the next agent release