IBM/ModuleFormer

Length exploration

Opened this issue · 0 comments

Can this model be used in a longer input without any finetuning or do we have to fine tune this model in a longer sequence length?