kuprel/min-dalle

CUDA error: device-side assert triggered

kuprel opened this issue · 4 comments

When running in high load settings on the A100 this error comes up after about 10 minutes. This happens on replicate and on the discord bot. The best fix so far is just to restart the server every 10 minutes. If anyone has a better fix for it, please post it here. Thanks

Is this happening on your own server or in that buggy Google Colab that likes to give weird errors?

Hah. I think I found a hacky fix that is working for now. Basically just try-except where except deletes the model and reinitializes it.

jk, that didn't work

Fix was to clamp the image and text tokens within their bounds. Either the text or image tokens must sometimes exceed their vocab count, I'm not sure which