replicate/replicate-javascript

Sync mode generation

Closed this issue · 5 comments

Hi I have a question regarding the replicate.run API call. How can I decrease the response time to less than 2 seconds? I am using the official Flux Schnell model and have the following settings, but I am still receiving the output in 2 seconds even though generation time in Replicate is less than 1 second. I have also tried using the HTTP call with Prefer: wait in the headers and still not receiving the output in less than 2 seconds.

const input = {
    prompt: "an astronaut on the moon"
    num_inference_steps: 1, 
    go_fast: true, 
    num_outputs: 1, 
    megapixels : "0.25"
};

const output = await replicate.run("black-forest-labs/flux-schnell", { input });
aron commented

Hi @rafationgsonwc, can you share more about your setup? Where is the server (or machine) located that is running the prediction?

Hi @aron, our server is running in Amazon ECS located in the ap-southeast-1 region and I have also tried it on my local machine with the same results (same region). Replicate version is ^0.12.1 and we cannot upgrade our Nodejs version for now.

Is there a significant performance improvement with the latest Replicate version and ^0.12.1?

aron commented

Hmm, so all of our servers are hosted in the US, flux-schnell is primarily served from Richmond, Virginia. So a large amount of what you're seeing here is going to be network latency.

All models that serve files support event-source streams that will push the file as a base64 encoded string over the stream. This is not fully supported in our client libraries yet but you could do a test by writing a vanilla JS function that reads the stream.

Depending on the version of node you're using you'll need to use an EventSource implementation like eventsource and provide a fetch polyfill like node-fetch.

Something like:

const fetch = require("node-fetch");
const EventSource = require("eventsource");


const prediction = await replicate.predictions.create({
  model: "black-forest-labs/flux-schnell",
  input: {
    prompt: "an astronaut on the moon"
    num_inference_steps: 1, 
    go_fast: true, 
    num_outputs: 1, 
    megapixels : "0.25"
  }
});

const stream = new EventSource(prediction.urls.stream, {
  fetch,
});

// "output" is sent on each file output.
stream.addEventListener('output', (event) => {
  console.log(event.data) // 
  // Close the stream after receiving the data if you don't care about the prediction completing.
  stream.close();
})

// "done" is sent when the prediction is complete.
stream.addEventListener('done', (event) => {
  stream.close(); // close the stream.
})

You can test the event source using cURL on your terminal by running:

 curl -H 'Accept: text/event-stream' <stream_url>
aron commented

I'm going to close this for the moment. If you have further questions please re-open.