batikim09/LIVE_SER

Valence score keeps to be low

Closed this issue · 2 comments

Hi,
Much appreciate for the great work! When trying the live mode of SER to predict arousal and valence, the arousal part works nicely. However, the valence always stays low (in 3 level prediction, 90%+ probability falls in the lowest part) no matter how cheerful we try to speak :)
Have you encountered similar situation before? May we have your advice? Thanks!

I too have the same issue, the valence is always low. Do we have to do some gain normalization?

Hi all, thank you for your interests in this project. Indeed, valence recognition by speech signals is much harder than arousal recognition. In even regulated lab settings, it unfortunately does not perform so well. Particularly, corpora used for training have severe imbalanced distributions of classes and I categorised discrete emotional classes into three levels of valence, which is not a good practice for good performance. Also, this is just to show how the off-the-shelf classifier works, not to provide a generic emotion recognition model (although we sometimes claim it in papers, I don't think we are there yet). Regarding gain normalisation, I implemented a simple automatic gain normalisation so it's not a really big issue. To improve the performance of valence, I recommend you to train your own models by your own data for your purpose. Rather than grouping discrete emotion classes, using data that has arousal and valence dimensions will be much better. Since this is a performance issue not a code issue, let me close this.