replicate/replicate-javascript

Replicate randomly fails with "Error: Prediction failed"

Levelleor opened this issue · 4 comments

I am currently working on an AI feature and while testing my code with Replicate I noticed that the same simple prompt would often just fail when running replicate's run method:

  async askAI(input) {
    try {
      const stream = await replicate.run("meta/meta-llama-3-70b-instruct", { input: { prompt: input } });
      return stream.join('');
    } catch (error) {
      throw error;
    }
  }

Haven't observed such issues with stream and prediction method's yet. I have error handling here, yet I see "unhandledRejection", which, I assume, is coming from replicate package itself.

The prompt is pretty much the same every time, and is actually something I generate prior to this step, I just feed it back to AI for revalidation. This makes it a lot more difficult to develop, but I can't figure why this is happening so far.

Is there any reason this is happening? Is this happening due to Replicate's API instability? Is run method not meant for stable production use?

Error: Prediction failed:
at Replicate.wait (webpack-internal:///(rsc)/./node_modules/replicate/index.js:394:13)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Replicate.run (webpack-internal:///(rsc)/./node_modules/replicate/index.js:158:18)
at async Object.askAI (webpack-internal:///(rsc)/./app/api/ai/workflow.js:320:28)
at async Object.parsePrompt (webpack-internal:///(rsc)/./app/api/ai/workflow.js:286:31)
at async eval (webpack-internal:///(rsc)/./app/api/ai/workflow.js:204:47)
⨯ unhandledRejection: Error: Prediction failed:
at Replicate.wait (webpack-internal:///(rsc)/./node_modules/replicate/index.js:394:13)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Replicate.run (webpack-internal:///(rsc)/./node_modules/replicate/index.js:158:18)
at async Object.askAI (webpack-internal:///(rsc)/./app/api/ai/workflow.js:320:28)
at async Object.parsePrompt (webpack-internal:///(rsc)/./app/api/ai/workflow.js:286:31)
at async eval (webpack-internal:///(rsc)/./app/api/ai/workflow.js:204:47)
⨯ unhandledRejection: Error: Prediction failed:
at Replicate.wait (webpack-internal:///(rsc)/./node_modules/replicate/index.js:394:13)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Replicate.run (webpack-internal:///(rsc)/./node_modules/replicate/index.js:158:18)
at async Object.askAI (webpack-internal:///(rsc)/./app/api/ai/workflow.js:320:28)
at async Object.parsePrompt (webpack-internal:///(rsc)/./app/api/ai/workflow.js:286:31)
at async eval (webpack-internal:///(rsc)/./app/api/ai/workflow.js:204:47)

I actually experience the exact same problem with predictions api. It sometimes just gets stuck, then on timeout returns an error "Error: Prediction failed".

mattt commented

Hi @Levelleor. replicate.run throws an error if the prediction fails. Same for predictions.create + .wait. You can wrap your calls in a try/catch to add automatic retry logic.

Thank you, I forgot there was this option. Is there any way to minimize the downtime, though? It takes about 20 seconds for it to actually fail and reach the catch clause. If there was a way to figure out whether the request had failed earlier that would be great. So far it looks like there are times at which about 65% of all my "run" calls get stuck like this, possibly high load on Replicate servers or not sure why otherwise. If I develop a regular retry mechanism it will take between 2-5 mins to complete 4 calls to AI that would have normally taken only up to 30 seconds since the same call may fail numerous times in a row.

mattt commented

Hi @Levelleor. Sorry you didn't have a great experience with our deployment of llama 3. We've been working to improve reliability as we've scaled to larger workloads, so please take another look and let me know what your experience is.

By the way, you can pass an AbortController signal to replicate.run to cancel the request after a timeout or user-initiated cancellation. Give this a try if you ever find yourself waiting for predictions to complete, like with a community model that isn't consistently warm.