Question about merging different task checkpoints

Question

Question about merging different task checkpoints

Bo396543018 opened this issue 6 months ago · 3 comments

It seems that the open source weights have one ckpt per task. Is it possible to merge the weights of different tasks and only load one merged model during inference? Has this been used in practice?

Answer 1 · 2024-06-24T08:34:32.000Z

It is possible to load one model but needs additional implementation. Making this change would make the overall code more complex, which is not suitable for multitask training, so we haven't added it for now.

The Hulk model has four parts: input tokenizer, encodMer(backbone), decoder and output de-tokenizer. While the whole encoder and decoder, which contain most of the parameters, are shared across all tasks, lightweight (de)-tokenizers are shared across modalities. Therefore, if you merge all the weights of different modalities (de)-tokenizers, and add extra determine statements during loading checkpoints, you can use only one checkpoint.

Answer 2 · 2024-06-24T09:22:12.000Z

@Cohesion97
Thank you for your reply. Merging ckpts is useful to deploy. I'll try your suggestion.