dvlab-research/LongLoRA

Distributed inference issue

yixliu1 opened this issue · 0 comments

Hi there,

I found out model is distributed loaded on each gpus while inferencing, but for each time iteration, only one data sample is being infereced. Is there anyway that we can dealing with multiple data samples at the same time?