Comparatively high initial prediction time for first predict() hit
Closed this issue · 2 comments
I am using minilm model with language 'en_core_web_sm'.
While comparing the prediction time, i.e., predictor.predict(text), the prediction time for first hit is always a bit high than the following hits.
Suppose after creating a predictor object, I call predict as follows:
predictor.predict(text)
---> first call
predictor.predict(text)
---> second call
predictor.predict(text)
---> third call
Time taken for the first call is comparatively a bit higher(.2 sec) than the next prediction calls(.05 sec).
Could you please help me understand why this initial hit takes a bit high prediction time?
@nemeer this is a PyTorch and probably general NN design choice, which is caused by the first call setting up a lot of things within the network like cache, memory on the GPU, graph optimization.
Thanks for clarifying @davidberenstein1957.