Repo for code / analysis for my submission to the Interspeech 2020 ADReSS Challenge
To re-run the analysis presented:
Assuming you've already got Anaconda / Miniconda installed.
- Checkout the repo
$ git clone https://github.com/tomolopolis/ADReSS_Challenge.git
$ cd ADReSS_Challenge
- Create new python conda environment:
$ conda create -n adress_chlng python=3.7
- Activate env:
$ conda activate adress_chlng
- Install deps:
$ pip install -r requirements
- Go back to base env, and install nb_conda_kernels to expose conda envrionments to jupyter:
$ conda install nb_conda_kernels
- Start Jupyter
$ jupyter lab
- To run the analysis as-is, place the unzipped train set data into ../data/train
- Open the 'Analysis.ipynb' notebook.
- Top right corner, select the adress_chlng kernel, run all cells.
- First get access to the BNC corpus
- Place unzipped corpus into ../data/bnc2014spoken-xml
- Open and run notebook Fine-Tune-LanguageModel.ipynb, to clean bnc corpus, apply 4 sentence sliding window, train BBPE tokenizer and output LM fine tuning config, finally runs the training script that is largely replicated from the training script provided by huggingface.