Inference latency

Question

Inference latency

zen-d opened this issue 2 years ago · 1 comments

@Fangyi-Chen Thanks for your great work. I would like to ask about how much inference latency increases compared to the basic pathway, since queries are much heavier in later decoding stages.

Answer 1 · 2023-02-16T21:40:40.000Z

Hi,

As a training strategy, the SQR is only applied in the training phase. the inference pipeline is not changed. So, no inference latency overhead.