Thanks for your great work! I have some question

Question

Thanks for your great work! I have some question

Leon1207 opened this issue 8 months ago · 4 comments

Dear authors. I have some questions about the lightweight caption head you proposed!
How does the lightweight caption head differ from existing captioning models in terms of architecture and computational efficiency so that it's a "lightweight design"?
Hope for your reply.

Leon1207 commented 8 months ago

Thanks!

Answer 1 · 2024-03-19T09:02:57.000Z

Nowadays, researchers are using large language models for image captioning. We identify our caption head as a "light-weight" design to support the possibility of set-to-set training.

Answer 2 · 2024-03-19T09:08:32.000Z

Thank you very much for your reply! Your explanation makes perfect sense! On the other hand, if methods like 3DJCG or D3Net don't use large models, are we lightweight enough?

Answer 3 · 2024-03-19T09:15:05.000Z

As long as they contain a small amount of parameters, you can also call them "light-weight".