/ADReSS_Challenge

Repo for the Interspeech 2020 ADReSS Challenge

Primary LanguageJupyter Notebook

Interspeech 2020 - Alzheimer's Dementia Recognition through Spontaneous Speech Challenge

Repo for code / analysis for my submission to the Interspeech 2020 ADReSS Challenge

To re-run the analysis presented:

Checkout & Install Required Deps

Assuming you've already got Anaconda / Miniconda installed.

  1. Checkout the repo
$ git clone https://github.com/tomolopolis/ADReSS_Challenge.git
$ cd ADReSS_Challenge
  1. Create new python conda environment:
$ conda create -n adress_chlng python=3.7
  1. Activate env:
$ conda activate adress_chlng
  1. Install deps:
$ pip install -r requirements
  1. Go back to base env, and install nb_conda_kernels to expose conda envrionments to jupyter:
$ conda install nb_conda_kernels
  1. Start Jupyter
$ jupyter lab

How to re-run the Analyis

  1. To run the analysis as-is, place the unzipped train set data into ../data/train
  2. Open the 'Analysis.ipynb' notebook.
  3. Top right corner, select the adress_chlng kernel, run all cells.

RoBERTa / BERT Language Model Fine Tuning

  1. First get access to the BNC corpus
  2. Place unzipped corpus into ../data/bnc2014spoken-xml
  3. Open and run notebook Fine-Tune-LanguageModel.ipynb, to clean bnc corpus, apply 4 sentence sliding window, train BBPE tokenizer and output LM fine tuning config, finally runs the training script that is largely replicated from the training script provided by huggingface.