davidberenstein1957/crosslingual-coreference

Comparatively high initial prediction time for first predict() hit

Closed this issue · 2 comments

I am using minilm model with language 'en_core_web_sm'.
While comparing the prediction time, i.e., predictor.predict(text), the prediction time for first hit is always a bit high than the following hits.
Suppose after creating a predictor object, I call predict as follows:

predictor.predict(text) ---> first call
predictor.predict(text) ---> second call
predictor.predict(text) ---> third call

Time taken for the first call is comparatively a bit higher(.2 sec) than the next prediction calls(.05 sec).
Could you please help me understand why this initial hit takes a bit high prediction time?

@nemeer this is a PyTorch and probably general NN design choice, which is caused by the first call setting up a lot of things within the network like cache, memory on the GPU, graph optimization.

https://datascience.stackexchange.com/questions/63476/why-the-first-prediction-of-neural-network-in-pytorch-is-slower-than-following-p

Thanks for clarifying @davidberenstein1957.