Extremely long cold boot times
UmarRamzan opened this issue · 2 comments
Is there any way to store large models in some kind of network storage to avoid long cold boot times?
Hi @UmarRamzan. I hear you — large models can take a while to setup from a cold boot. We do what we can to optimize network storage and caches, but at a certain point you're limited by physical limitations of hardware for transferring and loading 10^2GB of weights into GPU VRAM.
We have some docs about cold boots here: https://replicate.com/docs/how-does-replicate-work#cold-boots
If your application is sensitive to long cold starts, you can try creating a deployment and configuring a certain number of instances to always be running.
What model are you seeing this for?
Hi, creating a deployment is not financially feasible for us. We're currently using Whisper Large v-3, which is around 8 GB and takes a minute or two to load. I had this in mind when originally asking the question: https://modal.com/docs/guide/checkpointing