Is there a data leakage issue at least on MOSEI?
stonezhng opened this issue · 0 comments
stonezhng commented
If I understand the process correctly, for CMU-MOSEI data set, the model is using T5ForConditionalGeneration to encode the tokens of the text input. However in the evaluate function, the model is still passing t5_label, which is an encoded version of the sentiment score, to T5ForConditionalGeneration to get the encoded latent text feature during validation and test time. Is this a data leakage issue, where during the inference time the model has already known the label to be inferred?