declare-lab/contextual-utterance-level-multimodal-sentiment-analysis

creating features

amirim opened this issue · 0 comments

Hello there,

I read your paper and found differences between the number of features, please help me figure this out. for example:
text: 101, what about 300 features of word2vec?

These vectors are the publicly available 300-dimensional word2vec

BTW, can I use Glove instead and get the same accuracy?

audio: 74, although in the paper 6373?

Taking into account all functionals of each LLD, we obtained 6373 features.

visual: 101
how ?
Anyway, OpenFace aligns the frames, do you find it helpful in this case?

We use 3D-CNN (Ji et al., 2013) to obtain vi- sual features from the video.

Anyway, if I use my features extraction tool, for example of pyAudioAnalysis, Glove, etc.. and use your code by only changing the path of the csv file, will I get about the same accuracy?