Any Methods to alleviate Curse of Multi-Modalities?
Opened this issue · 1 comments
Hello, thank you very much for your research findings, particularly regarding the two multimodal hallucination issues mentioned in the paper: SPURIOUS INTER-MODALITY CORRELATIONS and OVERRELIANCE ON UNIMODAL PRIORS.
While performing SFT training on a specific classification task based on the Qwen2-VL-7B model, I encountered the aforementioned hallucination problems during inference on the test set. These issues significantly impact further performance improvements of the model, especially when trying to boost performance from 90 to 95. Are there any methods or references to alleviate these problems? Thank you very much for your response.
Hi, thanks for your interest!
You can try different decoding methods like VCD.