audeering/w2v2-how-to

Range value of arousal, valence, dominance

trfnhle opened this issue · 1 comments

I wonder what the range value of arousal, valence, and dominance is. As far as I know, model output is a logit vector size of 3 representing that feature and looks like its values range [0, 1]. I see that you use MSP-Conversation Corpus for fine-tuning. But when I looked at The MSP-Conversation Corpus paper paperlink, they mentioned that
"Notice that the values of the traces are in the range between -100 and 100. The figure shows that extreme values are uncommon. Most of the annotations are concentrated between -40 to 40 for valence, -20 to 50 for arousal, and -20 to 40 for dominance"

Do you guys normalize that feature, or do something related?

Yes, databases tend to use different scales for arousal/valence/dominance like 0..5.
We normalize all scales to 0..1 for training. During inference most of the values returned by the model are in this range, but it can happen that you also get some values outside of that range.