How to ensemble lightseq models? & the memory usage is too big when generating
baoguo1995 opened this issue · 3 comments
baoguo1995 commented
I ran into the following two problems when using lightseq3.0.
- I pass --path model1:model2 to ensemble model1 and model2 for generation just like fairseq-generate:
lightseq-generate $DATA_PATH \
--path part_1/checkpoint_4_267500.pt:part_1/checkpoint_4_265000.pt \
--batch-size 4 --beam 4 --remove-bpe \
--gen-subset ${name} \
--source-lang en \
--target-lang zh \
--max-len-a 1 \
--max-len-b 50 \
--lenpen 0.6 --fp16
but the operation fails in the middle with the following error(the checkpoints are from the same model).
Could you please suggest an example of ensemble?
- When I use lightseq-generate for generation, I found that 10GB of memory is required to load a transformer_big model for lightseq while 2GB of memory is only required to load the same model for fairseq. I wonder if this is as expected?
This is loading a lightseq transformer_big model:
This is loading a fairseq transformer_big model:
Environment
- Python 3.7
- pytorch 1.12
- fairseq 0.10.2
- lightseq 3.0
baoguo1995 commented
And I have another problem. When I specify --share-decoder-input-output-embed to share some matrices, I found that the models size is the same as the one not been specified. However, the models size are different in Fairseq.
So, I want to know how to confirm it works?
Taka152 commented
- we haven't tested fairseq ensemble, and not sure if there is any bus.
- You can change
max_batch_tokens
andmex_step
parameters in lightseq layers to reduce memory usage. - this parameter may not work, we only implement the shared version.
melody-rain commented
lightseq does not support model ensemble, I think.