This is a repository dedicated for training machine learning models for voice files with emotions (angry, disgust, fear, happy, neutral, or surprised) from video files downloaded from Youtube.
Active members of the team working on this repo include:
- Bineeta Gupta (Arizona State University)
- Luke Lyon (Boulder, CO)
- Anwar Akkari (Yale University)
- Jim Schwoebel (Boston, MA)
- Shivani Reddy (Clemson University)
We plan to do slack updates every week 8 PM EST on Fridays. If we need to do a work session, we will arrange for that.
Here are some goals to try to beat with demo projects. Below are some example files that classify various emotions with their accuracies, standard deviations, model types, and feature emebddings. It will give you a good idea on what to brush up on as you think about new embeddings for audio and text features for models.
Model Name | Feature embedding | Accuracy | Standard Deviation | Modeltype |
---|---|---|---|---|
disgust.pickle | character, polarity, rhythm, spectral | 0.9775293015 | 0.009225004885 | random forest |
surprise.pickle | character, polarity, onset, spectral, power | 0.8971036205 | 0.008219397678 | knn |
fear.pickle | pos, polarity, spectral | 0.8406798246 | 0.003728070175 | knn |
happy.pickle | character, polarity, spectral, power | 0.68 | 0.03479685397 | hard voting |
angry.pickle | polarity, rhythm, spectral, power | 0.6548830038 | 0.04924646135 | gradient boosting |
happy_sad.pickle | power | 0.6543740573 | 0.01507843069 | logistic regression |
sad.pickle | character, polarity, rhythm, spectral, power | 0.6313155529 | 0.02186253158 | hard voting |
happy_sad_neutral.pickle | pos, spectral | 0.4698875525 | 0.02512849173 | logistic regression |
all_emotions.pickle | character, pos, polarity, onset, rhythm, spectral | 0.2875083655 | 0.0358943377 | knn |
Make sure you have roughly 15 GB of free space on your hard disk.
Once you know you have this much space, you can download the dataset by clicking this link. After you click on the link go to the top right corner of the page and click download (the icon with the down arrow). After this, the download should start. This could take a while based on your internet connection.
The data is arranged in these folders: angry, disgust, fear, happpy, neutral, sad, surprise. Each wav file has a corresponding .json file with a transcript and features. The feature array in the .json file contains audio features (like mfcc coefficients and their deltas), as well as text features like part of speech tags. This is the standard mixed NeuroLex feature embedding.
Feel free to make your own feature arrays to model the data; these are just here for guidance in case you don't feel comfortable making your own features and/or if you'd like to test this feature array vs. other feature arrays that you custom engineer.
If you would like to make a new dataset, please check out the sub-repository called youtube_scrape. You can browse through existing links that we used for training videos in the past in the playlist folder of that repo. Instructions also are there for how to make new playlists.
We can download the videos and extract features for you if you provide us the playlist URLs.
There are a few places we could publish this work. Here are the submission deadlines.
Conference | Location | Deadline to submit | Description |
---|---|---|---|
Interspeech | Austria | TBA | Crossroads of Speech and Language |