This is the open source code for paper: Multimodal Reaction: Information Modulation for Cross-modal Representation Learning
This is the open source code for paper: Multimodal Reaction: Information Modulation for Cross-modal Representation Learning. We have provided the implementation on the task of multimodal sentiment analysis. The main components are as follow:
./datasets
contains the datasets used in the experiments./modules
contains the model definition./utils
contains the functions for data processing, evaluation metrics, etc. The file ./utils/loss.py will be updated in May, 2024.global_configs.py
defines important constantstrain.py
defines the training process
We have already provided the processed MOSI dataset in ./dataset
To download the larger MOSEI dataset, you can run datasets/download_datasets.sh
To install the required packages, you can run pip install -r requirements.txt
Before starting training, you should define the global constants in global_configs.py
. Default configuration is set to MOSI dataset. Important settings include GPU setting, learning rate, feature dimension, training epochs, training batch size, dataset setting and dimension setting of input data. To run the MOSEI dataset, remember to change dimension of the visual modality
from torch import nn
class DefaultConfigs(object):
device = '1' #GPU setting
logs = './logs/'
max_seq_length = 50
lr = 1e-5 #learning rate
d_l = 80 #feature dimension
n_epochs = 100 #training epochs
train_batch_size = 16 #training batch size
dev_batch_size = 128
test_batch_size = 128
model_choice = 'bert-base-uncased'
dataset = 'mosi' #dataset setting
TEXT_DIM = 768 #dimension setting
ACOUSTIC_DIM = 74
VISUAL_DIM = 47
# dataset = 'mosei'
# ACOUSTIC_DIM = 74
# VISUAL_DIM = 35
# TEXT_DIM = 768
config = DefaultConfigs()
To run the experiments, you can run the following command
python train.py
Note that the proposed method is flexible to work with various fusion strategies. We have performed experiments on element-wise addition, element-wise multiplication, concatenation and tensor fusion. If tensor fusion is adopted, the setting of feature dimension d_l
in global_configs.py
can be larger to ensure higher capacity.
We would like to express our gratitude to huggingface and MAG-BERT, which are of great help to our work.