snooze `[tcp] probe to user container failed: dial tcp 127.0.0.1:8080: connect: connection refused` log
nellaG opened this issue · 1 comments
nellaG commented
hello, I'm currently using cortex 0.39.1.
I have a BatchAPI with some of configuration.
When I check the log using AWS Cloudwatch log insight, there's so many [tcp] probe to user container failed: dial tcp 127.0.0.1:8080: connect: connection refused
logs so I cannot check my api job status well.
Is there a good practice to snooze that log using readiness_probe
or liveness_probe
config?
My api has only 2 endpoints ( /
, /healthz
) and here's my api configuration yaml.
batch api configuration
name: ***
kind: BatchAPI
pod:
port: 8080
containers:
- name: ***
image: ***
env:
[***]
command: [./run_app.sh]
readiness_probe:
http_get:
path: /healthz
port: 8080
initial_delay_seconds: 180
timeout_seconds: 1
period_seconds: 10
success_threshold: 1
failure_threshold: 3
liveness_probe:
http_get:
path: /
port: 8080
initial_delay_seconds: 0
timeout_seconds: 1
period_seconds: 10
success_threshold: 1
failure_threshold: 3
compute:
cpu: 200m
gpu: 1
mem: 2G
shm: 1Gi
networking:
endpoint: /tracker-dev
node_groups: [gpu-spot, gpu-on-demand]
I'm always thankful for your support and cortex. 😃
nellaG commented
I may use some query excluding that message using AWS Log Insights. thanks