Sync mode generation
Closed this issue · 5 comments
Hi I have a question regarding the replicate.run API call. How can I decrease the response time to less than 2 seconds? I am using the official Flux Schnell model and have the following settings, but I am still receiving the output in 2 seconds even though generation time in Replicate is less than 1 second. I have also tried using the HTTP call with Prefer: wait in the headers and still not receiving the output in less than 2 seconds.
const input = {
prompt: "an astronaut on the moon"
num_inference_steps: 1,
go_fast: true,
num_outputs: 1,
megapixels : "0.25"
};
const output = await replicate.run("black-forest-labs/flux-schnell", { input });
Hi @rafationgsonwc, can you share more about your setup? Where is the server (or machine) located that is running the prediction?
Hi @aron, our server is running in Amazon ECS located in the ap-southeast-1 region and I have also tried it on my local machine with the same results (same region). Replicate version is ^0.12.1 and we cannot upgrade our Nodejs version for now.
Is there a significant performance improvement with the latest Replicate version and ^0.12.1?
Hmm, so all of our servers are hosted in the US, flux-schnell is primarily served from Richmond, Virginia. So a large amount of what you're seeing here is going to be network latency.
All models that serve files support event-source streams that will push the file as a base64 encoded string over the stream. This is not fully supported in our client libraries yet but you could do a test by writing a vanilla JS function that reads the stream.
Depending on the version of node you're using you'll need to use an EventSource implementation like eventsource and provide a fetch polyfill like node-fetch.
Something like:
const fetch = require("node-fetch");
const EventSource = require("eventsource");
const prediction = await replicate.predictions.create({
model: "black-forest-labs/flux-schnell",
input: {
prompt: "an astronaut on the moon"
num_inference_steps: 1,
go_fast: true,
num_outputs: 1,
megapixels : "0.25"
}
});
const stream = new EventSource(prediction.urls.stream, {
fetch,
});
// "output" is sent on each file output.
stream.addEventListener('output', (event) => {
console.log(event.data) // 
// Close the stream after receiving the data if you don't care about the prediction completing.
stream.close();
})
// "done" is sent when the prediction is complete.
stream.addEventListener('done', (event) => {
stream.close(); // close the stream.
})You can test the event source using cURL on your terminal by running:
curl -H 'Accept: text/event-stream' <stream_url>
I'm going to close this for the moment. If you have further questions please re-open.