Voice Pathology Detection System using Deep Learning

Introduction

The voice is a principal tool allowing individuals to communicate and exchange information in their day-to-day activities. However, any slight alteration in the voice production system might affect the voice quality. Speech dysfunction deviates quality, pitch, loudness, or vocal flexibility from voices of age, gender, and social classes. However, the individuals at high risk of developing voice disorders are teachers. Due to prolonged speaking periods, vocal strain from teaching in lecture rooms with poor acoustics or high background noises, the vocal damage exacerbates. These disorders are also observed in other individuals who use their voice professionally, like singers, lawyers, call center professionals, etc. The probability of this malady occurring is high in females and older individuals. With the use of advanced technology, like deep learning, it is possible to identify the symptoms of such illnesses at an early stage and prevent severe damage to vocal organs.

Need of Project

A pathological voice is a result of disorders or damage to the vocal organs by any means, and is not limited to pathogens, like viruses, bacteria, fungus, and parasites. Voice pathology can be caused by tissue infection, systemic changes, mechanical stress, surface irritation, tissue changes, neurological and muscular changes, and other factors. Due to vocal pathology, the mobility, functionality, and shape of the vocal folds are affected resulting in irregular vibrations and increased acoustic noise. Such voice sounds strained, harsh, weak, and breathy, which significantly contributes to the overall poor voice quality. The consequence of such pathological speech disorders is not life-threatening. However, the effects of untreated voice dysfunction may have a drastic impact on social, occupational, and personal aspects of communication.

Scope

The proposed framework focuses on early diagnosis and treatment of pathological voice disorder with the aid to deep learning models. It also sheds light on a unique pathway to develop a system which requires less computational complexity and provides faster processing as compared to the traditional systems.

Deliverables of the project can be summarized as follows :

  • Cost-effective and accurate diagnosis reports provided to the patients
  • User-friendly & remotely accessible application for both, patients as well as medical professionals

Exclusions and constraints of the proposed project can be elucidated as follows:

  • The focus of the system will be on the detection of functional dysphonia, limited to physical damage only, and will not consider other pathologies. Other pathologies are out of the scope of this project.
  • The system will face a cold-start problem for discriminating the changes in the voice, as the use of the Comparator network is possible only if previously recorded audio of the patient is available.
  • Exceptional cases might occur in the classification phase, as there is a possibility of multiple pathologies (other than the one due to physical damage) having similar characteristics and symptoms. In this case, the pathology will be detected but will be misclassified and needs to be further investigated by the medical professional.

Methodology

Methodology followed

Application

The application of the suggested system is in the biomedical field where it will provide an individual, easy accessibility to medical assistance as well as cost effective and accurate diagnosis of voice disorders caused due to physical detriment.