Source code for the project titled: Machine Learning Approaches to Predict Parkinson’s Disease using Speech Signals
==> MDVR-KCL Dataset
- The dataset is stored in the dataset folder.
- The dataset/ReadText was used in this project
- The dataset/ReadText/HC contains the healthy controls
- The dataset/ReadText/PD contains the PD patients
==> Set up for the python code files:
-
Create/Activate a virtual environment python -m venv .venv
-
Run .requirements file to install the packages pip3 install -r requirements.txt
==> To extract the features (FeatureExtraction folder)
- The feature_extraction.py contains the helper functions definition used to extract features from a sound file
- The main.py extracts the features of the MDVR_KCL dataset using these functions
- The extract_italian_features.py shows the extraction of the fetaures from the Italian dataset
- The alc_extraction.py shows the extraction of the features from the ALC dataset
- The features are saved in different csv files for each dataset
==> SOM experiment Repeatability of Clusters
- The som_script_4.r contains the implementation of the repeatability of clusters algorithm
- Ensure the readtext.csv file is available. This is the file that contains the extracted features.
==> ML Models (Modelling folder)
- alc.py includes the implementation of different ML models on the ALC dataset
- italian.py includes the implementation of different ML models on the Italian dataset
- mdvr_kcl.py includes the implementation of different ML models on the Italian dataset
==> Experiments
- Modelling/MDVR_KCL_experiments.ipynb : notebook for the experiments involving the MDVR_KCL dataset
- Modelling/Italian_experiments.ipynb : notebook for the experiments involving the Italian dataset