replicate/replicate-python

Extremely long cold boot times

UmarRamzan opened this issue · 2 comments

Is there any way to store large models in some kind of network storage to avoid long cold boot times?

Hi @UmarRamzan. I hear you — large models can take a while to setup from a cold boot. We do what we can to optimize network storage and caches, but at a certain point you're limited by physical limitations of hardware for transferring and loading 10^2GB of weights into GPU VRAM.

We have some docs about cold boots here: https://replicate.com/docs/how-does-replicate-work#cold-boots

If your application is sensitive to long cold starts, you can try creating a deployment and configuring a certain number of instances to always be running.

What model are you seeing this for?

Hi, creating a deployment is not financially feasible for us. We're currently using Whisper Large v-3, which is around 8 GB and takes a minute or two to load. I had this in mind when originally asking the question: https://modal.com/docs/guide/checkpointing