project-codeflare/multi-cluster-app-dispatcher

[core] Behavior PreemptQueuejobs thread

Closed this issue · 1 comments

asm582 commented

Describe the Bug

Preemptqueuejobs run every 60 seconds. for an AW that is slow in starting pods due to a large image pull time, this may cause premature AW preemption.

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

Codeflare SDK:
MCAD: v1.33 and above

Steps to Reproduce the Bug

NA

What Have You Already Tried to Debug the Issue?

NA

Expected Behavior

MCAD should give AW enough time to pull very large images before preemption.

Screenshots, Console Output, Logs, etc.

NA

Affected Releases

R1.33 and above

Additional Context

NA
Add any other information you think might be useful here.

asm582 commented

closing, this issue can be fixed by adding requeuing stanza in schedulingSpec stanza