Inference latency
zen-d opened this issue · 1 comments
zen-d commented
@Fangyi-Chen Thanks for your great work. I would like to ask about how much inference latency increases compared to the basic pathway, since queries are much heavier in later decoding stages.
Fangyi-Chen commented
Hi,
As a training strategy, the SQR is only applied in the training phase. the inference pipeline is not changed. So, no inference latency overhead.