replicate/replicate-python

model container failed to boot and complete setup within 600 seconds

Closed this issue · 10 comments

Hi,

We are currently experiencing boot error while our code has not changed?!

We don't have much information except "model container failed to boot and complete setup within 600 seconds"

What could be the issue ? The built docker runs well locally. The issue is not always persistent.

Please contact support for help with your specific model: https://replicate.com/support

Support has been contacted earlier this morning, issue is still not solve. And What about such an error message without more clues, without log,

It should be planned asap to have more explicit error messages

Thanks for the update with the setup logs! That's a very good news.

Your communication on Github should be improved, I just discovered it this morning.

mattt commented

Hi @christopher5106. Sorry we didn't do a great job helping you out here. It sounds like the setup logs on replicate.com gave you the information you needed to solve your problem. Was there anything else we can help you with?

@mattt after discussion with my team, we are not sure the setup logs were there before, because we gave attention to them only after we got machines not booting at all:

model container failed to boot and complete setup within 600 seconds

As you can see, yesterday, while your status page was totally fine with green flags, we were unable to start any prediction for the day of yesterday

image
image

We don't know what we should do when we see the message

model container failed to boot and complete setup within 600 seconds

because it's not explicit enough to take action for us. Today while the docker image has not been changed, we have no more problem, but still a bit slow booting times.

We experienced the non booting issue on multiple model endpoints, and also on deployment endpoints.

mattt commented

@christopher5106 Sorry, I'm having trouble finding a support ticket with more details about your case. Could you please share a link to the model that's failing to boot?

I sent you by PM over X half a dozen versions we published for which there were a few dozen boot failures yesterday.

mattt commented

@christopher5106 Thanks for your patience. I escalated internally, and our customer support engineer has responded to your ticket. I'll let them take it from there.

Thanks!