Interspeech 2020 - Alzheimer's Dementia Recognition through Spontaneous Speech Challenge

Repo for code / analysis for my submission to the Interspeech 2020 ADReSS Challenge

To re-run the analysis presented:

Assuming you've already got Anaconda / Miniconda installed.

$ git clone https://github.com/tomolopolis/ADReSS_Challenge.git
$ cd ADReSS_Challenge

$ conda create -n adress_chlng python=3.7

$ conda activate adress_chlng

$ pip install -r requirements

Go back to base env, and install nb_conda_kernels to expose conda envrionments to jupyter:

$ conda install nb_conda_kernels

$ jupyter lab

RoBERTa / BERT Language Model Fine Tuning

First get access to the BNC corpus
Place unzipped corpus into ../data/bnc2014spoken-xml
Open and run notebook Fine-Tune-LanguageModel.ipynb, to clean bnc corpus, apply 4 sentence sliding window, train BBPE tokenizer and output LM fine tuning config, finally runs the training script that is largely replicated from the training script provided by huggingface.