audeering/w2v2-how-to

Convert VAD to Ekman

mirix opened this issue · 3 comments

mirix commented

Hello,

This model provides VAD values in 3D space.

However, the Ekman model is more intuitive to share the results with users.

I have found papers with 3D representations hinting at how to perform this conversion.

Are you aware of a straightforward approach to perform the conversion between both models?

Ideally in Python, but any hint on the algorithm would also do.

Best,

Ed

hagenw commented

The VAD model is only fine-tuned on the MSP-Podcast dataset, which has several shortcomings for a full blown VAD model:

  • Podcast recordings most likely do not contain all possible emotions, e.g. fear
  • The dominance and arousal annotations show a high correlation, that is mimicked by the model, which means we most likely do not cover the 3D space of VAD in a meaningful way

Having this in mind I would propose to be very carefully when trying to map the VAD values to emotional categories.

Another way might be to further fine-tune the model on a given database containing the desired emotional categories, or using the embeddings of the model to train a simple classifier on such a database like we do in the notebook under the "https://github.com/audeering/w2v2-how-to/blob/main/notebook.ipynb" section.

mirix commented

@hagenw

Thanks a million for the clarifications.

In general, the conversion from VAD to Ekman seems to provide useful results:

https://github.com/mirix/approaches-to-diarisation/tree/main/emotions

However, it is true that fear is never detected.

I will see what other models are available and pay more attention to which datasets were used.

mirix commented

Hi @hagenw

I have forked MOSEI for SER:

https://huggingface.co/datasets/mirix/messaih

https://github.com/mirix/messaih

Now I will try to train a model and test it in a real-life scenario.