Table of Contents
The average life expectancy is increasing globally due to advancements in medical technology, preventive health care, and a growing emphasis on gerontological health. Therefore, developing technologies that detect and track aging-associated disease in cognitive function among older adult populations is imperative. In particular, research related to automatic detection and evaluation of Alzheimer's disease (AD) is critical given the disease's prevalence and the cost of current methods. As AD impacts the acoustics of speech and vocabulary, natural language processing and machine learning provide promising techniques for reliably detecting AD. We compare and contrast the performance of ten linear regression models for predicting Mini-Mental Status Exam scores on the ADReSS challenge dataset. We extracted 13000+ handcrafted and learned features that capture linguistic and acoustic phenomena. Using a subset of 54 top features selected by two methods: (1) recursive elimination and (2) correlation scores, we outperform a state-of-the-art baseline for the same task. Upon scoring and evaluating the statistical significance of each of the selected subset of features for each model, we find that, for the given task, only handcrafted linguistic features are significant while acoustic and learned features are not.
You will need to download the Feature Extraction tools, Install Package Requirements
Get Text Feature Extraction Tools from NLP TOOLS FOR THE SOCIAL SCIENCES This includes CLA , ARTE , TASSC , SEANCE , Sinlp , Taaco , and Taales
Package Prerequisites
pip install -r requirements.txt
Install Disvoice
git clone https://github.com/jcvasquezc/DisVoice
Audio Extraction Is done by using pyAudioAnalysis and DisVoice
Text Extraction is Done by CLA , ARTE , TASSC , SEANCE , Sinlp , Taaco , and Taales
For audio extraction you would run the first block of Code in the JupyterNote book file labeled "Audio Feature Extraction" and You will get the PyAudioAnalysis Features Then For disvoice for Articulation for an example you will have to follow the steps in their repo; where it would look something like this:
python articulation.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>
Then for text extraction Download all the tools and follow the steps on each of their links and Use the Text folder as folder for it to extract from You would then take all the CSVs produced from text extraction and combine them and put it into a dataframe with the Audio Features
Selection is done through 2 different methods of SelectKBest by correlation and RFECV. You will pass the DataFrame gathered from the extraction phase to the Each of the extraction methods and it will return top 100 features from each of them.
Utilize the Training Block in the Final Jupyter Note File where you would pass the top 100 features of each and it will train 2 different sets of models based on Correlation and RFECV.
Lastly Testing is Done alongside the Training Where after the model is done training it is then tested on the test set. You will be able to further test the Model and Graph the results in the next blocks of the code.