cortexlabs/cortex

snooze `[tcp] probe to user container failed: dial tcp 127.0.0.1:8080: connect: connection refused` log

nellaG opened this issue · 1 comments

hello, I'm currently using cortex 0.39.1.

I have a BatchAPI with some of configuration.
When I check the log using AWS Cloudwatch log insight, there's so many [tcp] probe to user container failed: dial tcp 127.0.0.1:8080: connect: connection refused logs so I cannot check my api job status well.

Is there a good practice to snooze that log using readiness_probe or liveness_probe config?

My api has only 2 endpoints ( /, /healthz ) and here's my api configuration yaml.

batch api configuration
name: ***
kind: BatchAPI
pod:
  port: 8080
  containers:
    - name: ***
      image: ***
      env:
       [***]
      command: [./run_app.sh]
      readiness_probe:
        http_get:
          path: /healthz
          port: 8080
        initial_delay_seconds: 180
        timeout_seconds: 1
        period_seconds: 10
        success_threshold: 1
        failure_threshold: 3
      liveness_probe:
        http_get:
          path: /
          port: 8080
        initial_delay_seconds: 0
        timeout_seconds: 1
        period_seconds: 10
        success_threshold: 1
        failure_threshold: 3
      compute:
        cpu: 200m
        gpu: 1
        mem: 2G
        shm: 1Gi
networking:
  endpoint: /tracker-dev
node_groups: [gpu-spot, gpu-on-demand]

I'm always thankful for your support and cortex. 😃

I may use some query excluding that message using AWS Log Insights. thanks