Long Response Time / high latency

Question

Long Response Time / high latency

MediGenie opened this issue a year ago · 3 comments

So I am based in Seoul, South Korea. I know Azure OpenAI/OpenAI resources are only based in US/Europe and not Asia yet, but for people in US East/US Central are you guys experiencing high latency or long time to respond?

For me i brought Azure resources that i can and Pinecone to Korea Central, but I am wondering @akshata29 what are some other things or ideas you can think of that I can do to speed things up on my end? Not taking cost into a factor. I just want chatpdf to run fast like https://chatpdf.com/

Thank you!!

Answer 1 · 2023-06-08T11:01:24.000Z

At times I do see some latency issue in US, but most of the time have not run into the issue. Depending on your use-case, you can implement the Cache mechanism (I have that in list to implement it) and/or build your own KB to not call OpenAI all the time based on the KB you are building on your document/data

Answer 2 · 2023-06-08T17:03:57.000Z

Thank you SO much for your response. I have referenced the nice Azure Architecture diagram, but could you be more specific about what you mean when you say cache mechanism? Are you saying when i ask a question on chat...Cog Search(US East) searches on Pinecone(Korea Central) and then returns the findins back to US East which the search results are added with the question and then Cog Search returns the results to my computer in Korea Central? What you are saying is that its good to cache the documents in US East? It would be wonderful if you can explain in detail. :)

Answer 3 · 2023-07-30T19:32:43.000Z

Implemented the Cache pattern (using Cognitive search) in the repo. Moreover you can now also implement pattern as per the reference architecture we created at https://learn.microsoft.com/en-us/azure/architecture/example-scenario/ai/log-monitor-azure-openai to load-balance against multiple AOAI instance. Lastly for enterprise customers, we do allow PTU (Provisioned Throughput) model to get the latency and performance metrics.