bytedance/lightseq

How to ensemble lightseq models? & the memory usage is too big when generating

baoguo1995 opened this issue · 3 comments

I ran into the following two problems when using lightseq3.0.

  1. I pass --path model1:model2 to ensemble model1 and model2 for generation just like fairseq-generate:
lightseq-generate $DATA_PATH \
    --path part_1/checkpoint_4_267500.pt:part_1/checkpoint_4_265000.pt \
    --batch-size 4 --beam 4 --remove-bpe \
    --gen-subset ${name} \
    --source-lang en \
    --target-lang zh \
    --max-len-a 1 \
    --max-len-b 50 \
    --lenpen 0.6 --fp16

but the operation fails in the middle with the following error(the checkpoints are from the same model).
image

Could you please suggest an example of ensemble?

  1. When I use lightseq-generate for generation, I found that 10GB of memory is required to load a transformer_big model for lightseq while 2GB of memory is only required to load the same model for fairseq. I wonder if this is as expected?

This is loading a lightseq transformer_big model:
image

This is loading a fairseq transformer_big model:
image

Environment

  • Python 3.7
  • pytorch 1.12
  • fairseq 0.10.2
  • lightseq 3.0

And I have another problem. When I specify --share-decoder-input-output-embed to share some matrices, I found that the models size is the same as the one not been specified. However, the models size are different in Fairseq.
So, I want to know how to confirm it works?

  1. we haven't tested fairseq ensemble, and not sure if there is any bus.
  2. You can change max_batch_tokens and mex_step parameters in lightseq layers to reduce memory usage.
  3. this parameter may not work, we only implement the shared version.

lightseq does not support model ensemble, I think.