NVIDIA/nv-ingest

[BUG]: Paddle Version checking can get into an infinite loop when failing or PADDLE_HTTP_ENDPOINT is not specified

Opened this issue · 0 comments

Version

24.10

Which installation method(s) does this occur on?

Docker

Describe the bug.

The paddle version check requires the HTTP endpoint, if it is not specified (or doesn't exist), the progress engine will wait/retry forever.

Proposed solution

  • Make PADDLE_HTTP_ENDPOINT a required parameter in the paddle schema
  • Set some kind of maximum threshold on the @backoff decorator (maybe configurable?) so that if we fail to get the version after 5-10 minutes we fail the job.

Minimum reproducible example

No response

Relevant log output

No response

Other/Misc.

No response