Model latency numbers include time taken to retry
saikatmitra91 opened this issue ยท 2 comments
saikatmitra91 commented
๐ Description
Since the model calls are wrapped by SDK and the SDK internally retries, the latency time calculation includes the total time taken to get the response and not the time taken by the final request to resolve.
๐ธ Screenshots / Code Snippets
๐ Proposed Solution
- reset the timer when doing a retry
sumitd94 commented
@saikatmitra91 I would like to work on this, can you please assign this to me?
arjunattam commented
Fixed with #216. Closing.