This repository corresponds to the code used to develop the study presented in the article entitled "Analysis of Factors that Influence the Performance of Biometric Systems Based on EEG Signals."
Authors:
- Dustin Carrión-Ojeda (dustin.carrion@gmail.com)
- Rigoberto Fonseca-Delgado (rfonseca@inaoep.mx)
- Israel Pineda (ipineda@yachaytech.edu.ec)
This repository is made up of eleven python scripts and two folders containing the data and results. The scripts can be divided into three categories:
- Data Preparation.
- Hyperparameter Optimization.
- Experiment.
On the other hand, the folders DEAP and BIOMEX-DB correspond to the datasets used in this study, and each one will have three subfolders:
data
Contains all the information necessary for the study.hyperparameter_optimization
Contains the specific data used for hyperparameter optimization and its results.experiment
Contains the specific data used for the experiment and its results.
Each of the scripts categories and how to use them are detailed below.
Note: The order of execution of the scripts corresponds to the order presented below.
This category is composed of the following scripts:
create_feature_matrices.py
split_feature_matrices.py
The first script is responsible for preprocessing the electroencephalograms (EEG) using the discrete wavelet transform (DWT), and also it generates the feature matrices. These matrices are composed of the relative wavelet energy (RWE) of all detail coefficients and the last approximation coefficient. To run this script, it is necessary to download the datasets used in this study.
The DEAP dataset download process is detailed on its page. This study used the preprocessed version of this dataset. In the case of BIOMEX-DB, as it is not publicly accessible, to be able to access it, it must be requested directly from its authors.
Once the data is downloaded, the following command can be executed:
python create_feature_matrices.py input_data_path dataset_name
input_data_path
corresponds to the relative path to the folder where the data downloaded from each of the databases is located, while dataset_name
can only take two values: deap
or biomex-db
.
After the feature matrices are generated, they must be divided into data for hyperparameter optimization and data for the experiment. The percentage of data used for optimization was 20%, while the remaining 80% was used in the experiment. To perform this division, the following command is used:
python split_feature_matrices.py dataset_name
Note: Due to possible complications getting the original datasets, the repository provides the feature matrices (result of create_feature_matrices.py
). For this reason, the script split_feature_matrices.py
can be run without any problem.
This study uses six classifiers:
- Support Vector Machine (SVM)
- K-nearest Neighbors (KNN)
- Random Forest (RF)
- AdaBoost (AB)
- Gaussian Naïve Bayes (GNB)
- Multilayer Perceptron (MLP)
To obtain the best results, greedy search optimization was applied based on a ten-fold-cross validation with overlapping between folds to find the best hyperparameters for each classifier. For running the scripts in this category, it is necessary to have run the Data Preparation scripts. All hyperparameter optimization scripts can be executed with the following command:
python run_optimization.py
This command is equivalent to running the following:
python create_optimization_fold_files.py deap
python create_optimization_fold_files.py biomex-db
python grid_search.py deap
python grid_search.py biomex-db
python grid_search_neural_network.py deap
python grid_search_neural_network.py biomex-db
python read_optimization_results.py deap
python read_optimization_results.py biomex-db
The functionality of each script is detailed below:
create_optimization_fold_files.py
Generates the training and testing data of each fold.grid_search.py
Executes hyperparameter optimization for SMV, KNN, RF, AB, and GNB. This script generates two.sav
files for each classifier. One file corresponds to the evaluated hyperparameters, and the other contains the average accuracy reached by the classifier using these hyperparameters.grid_search_neural_network.py
The functionality of this script is the same as the previous one, but in this case, the classifier is MLP.read_optimization_results.py
Reads the results generated withgrid_search.py
andgrid_search_neural_network.py
and selects the best set of hyperparameters for each classifier based on the accuracy achieved.
The objective of the experiment was to assess the impact of the duration of EEG recordings and the levels of decomposition of the DWT on the performance of the classifiers. In both datasets, each signal was segmented into the following times: 0.25, 0.5, 0.75, 1, 1.25,1.5, 1.75, 2, 2.25, and 2.5 seconds. To simulate the differences that may exist between the recordings in a real scenario, the start of the segmentation was randomly taken. This work uses ten-fold-cross validation to increase the reliability of the experimental results. The performance metrics used to evaluate the classifiers are the Average accuracy (Acc), Macro-averaging Sensitivity (Se), and Macro-averaging Specificity (Sp) :
To run the scripts in this category, it is necessary to have run the Data Preparation scripts. The execution of the Hyperparameter Optimization scripts is not required because it was already executed during the development of the study. Thus, the Experiment scripts were coded with the best set of hyperparameters for each classifier. All experiment scripts can be executed with the following command:
python run_experiment.py
This command is equivalent to running the following:
python create_experiment_fold_files.py deap
python create_experiment_fold_files.py biomex-db
python k_cross.py deap
python k_cross.py biomex-db
python k_cross_neural_network.py deap
python k_cross_neural_network.py biomex-db
The functionality of each script is detailed below:
create_experiment_fold_files.py
Generates the training and testing data for each fold.k_cross.py
Executes ten-fold-cross validation for SMV, KNN, RF, AB, and GNB classifiers with each time segment. This script generates three.xlsx
files for each time segment. Each file corresponds to a metric (accuracy, sensitivity, and specificity) and contains the results of all classifiers at each fold.k_cross_neural_network.py
The functionality of this script is the same as the previous one, but in this case, the classifier is MLP.