keylimetoolbox/resque-kubernetes

Job isn't marked as 'Completed'

Closed this issue · 4 comments

After a resque job is completed in kubernetes job it isn't marked as completed job
My manifest is like:

  def self.job_manifest
    YAML.safe_load(
        <<~MANIFEST
      apiVersion: batch/v1
      kind: Job
      metadata:
        name: resque-job
      spec:
        template:
          metadata:
            name: resque-job
          spec:
            containers:
            - name: resque-job
              image: image
              command: ["bundle", "exec", "rake", "environment", "resque:work"]
              volumeMounts:
                - name: google-cloud-credentials
                  mountPath: /configs
                  readOnly: true
              env:
                - name: RAILS_ENV
                  value: #{ENV['RAILS_ENV']}
                - name: RACK_ENV
                  value: #{ENV['RAILS_ENV']}
                - name: REDIS_URL
                  value: "redis_url"
                - name: DB_POOL
                  value: "50"
                - name: QUEUE
                  value: "queue_name"
              resources:
                requests:
                  cpu: "200m"
                  memory: "1024Mi"
                limits:
                  cpu: "500m"
                  memory: "2048Mi"
    MANIFEST
    )
end

version 0.6.0

For clarity, the Kubernetes "job" is the Resque worker. The worker will continue to run until the queue is empty, at which time it should terminate and Kubernetes will mark it's job as complete. If there are still Resque jobs in the queue, then the worker will continue to run.

Is the kubernetes job not marked as completed when the queue is empty? Is the pod still running?

Can you inspect the kubernetes job and tell me whether TERM_ON_EMPTY environment variable is set? Also, does the image that you are loading for the worker include the resque-kubernetes gem? Both of those need to be try for the worker to terminate properly and the job to complete.

@jeremywadsack
Queue is empty after job is completed, pod is still running, TERM_ON_EMPTY is set (is set automatically by gem) and resque-kubernetes is installed (I use the same image as for an app)

Thanks for double checking all that. For some reason the resque worker isn't terminating when the queue is empty.

This gem monkey-patches Resque::Worker to shutdown the worker when TERM_ON_EMPTY environment variable is truthy and when the queues are empty. All the queues that the worker is following must be empty for it to terminate. Is your worker following multiple queues that might still have items in them?

I guess it's because of my attempts to kill sidecar container... This feature are woking fine.
BTW, it would be cool to add some method to kill sidecar, I will create another issue.
Sorry for the false alarm.