Fangyi-Chen/SQR

Inference latency

zen-d opened this issue · 1 comments

zen-d commented

@Fangyi-Chen Thanks for your great work. I would like to ask about how much inference latency increases compared to the basic pathway, since queries are much heavier in later decoding stages.

Hi,

As a training strategy, the SQR is only applied in the training phase. the inference pipeline is not changed. So, no inference latency overhead.