/rhacks

Primary LanguageJupyter Notebook

NoobCoders -- Speech Emotion Recognition

Under Rajasthan IT Day 2023 - Develop solutions that use AI to solve complex problems of mental health and offer personalized results to people worldwide, the solution should be scalable, adaptable to various cultures and languages, and incorporate the most recent developments in AI technology.

What is speech emotion recognition (SER) ?

  • Speech Emotion Recognition, abbreviated as SER, is the act of attempting to recognize human emotion and affective states from speech. This is capitalizing on the fact that voice often reflects underlying emotion through tone and pitch. This is also the phenomenon that animals like dogs and horses employ to be able to understand human emotion.

Why we need it ?

  1. Emotion recognition is the part of speech recognition which is gaining more popularity and need for it increases enormously. Although there are methods to recognize emotion using machine learning techniques, this project attempts to use deep learning to recognize the emotions from data.

  2. Mental health - monitor and evaluate the emotional state of patients suffering from mental illnesses

  3. Security and law enforcement - identifying verbal cues that a suspect is lying, acting aggressively, or showing signs of nervousness.

  4. SER(Speech Emotion Recognition) is used in call center for classifying calls according to emotions and can be used as the performance parameter for conversational analysis thus identifying the unsatisfied customer, customer satisfaction and so on.. for helping companies improving their services

  5. Human-computer interaction - monitor and improve student engagement and academic performance.

  6. It can also be used in-car board system based on information of the mental state of the driver can be provided to the system to initiate his/her safety preventing accidents to happen

Tech Stats used in this project

  • Python, HTML, CSS, Tensorflow, Keras, OpenCV ,flask ,numpy

Datasets used in this project

  • Crowd-sourced Emotional Mutimodal Actors Dataset (Crema-D)
  • Ryerson Audio-Visual Database of Emotional Speech and Song (Ravdess)
  • Surrey Audio-Visual Expressed Emotion (Savee)
  • Toronto emotional speech set (Tess)

Roles of team members

  1. Raksha Pahariya - Combining all the four datasets into single dataframe, Data Visualisation and Exploration, Data augmentation, Feature extraction, Data Preparation
  2. Arihant Jain - Building a neural network model on preprocessed data, achieving the best accuracy of 60% on 8 classes after a series of experiments
  3. Ashlesha Gautam and Devans Soni - Deploying the saved model on web application using flask and creating a design of website using html

How to run the project on your system ?

  • Download the complete code, create a virtual environment the navigate to bda_project/ folder run the command pip install -r requirements.txt.
  • once the requirements are install run the command python main.py

Project Demonstration : https://drive.google.com/file/d/19RUdz6UtK6Kb2X1-Azk9GMufdtiHy0JW/view?usp=share_link