laisimiao opened this issue 9 months ago · 1 comments
May I know during UnTrack training, it does need three modalities data simultaneously or any combination of modalities can be used as input?
Hi,
To train the model, we mix the D/T/E modality in the minibatch.
In such a way, we enable cross-modal interaction.