Using Machine Learning and Deep Learning to predict cognitive tasks from electroencephalography (EEG) signals has been a fast- developing area in Brain-Computer Interfaces (BCI). However, dur- ing the COVID-19 pandemic, data collection and analysis could be more challenging. The remote experiment during the pandemic yields several challenges, and we discuss the possible solutions. This paper explores machine learning algorithms that can run efficiently on personal computers for BCI classification tasks. The results show that Random Forest and RBF SVM perform well for EEG classification tasks. Furthermore, we investigate how to conduct such BCI experiments using affordable consumer-grade devices to collect EEG-based BCI data. In addition, we have developed the data col- lection protocol, EEG4Students, that grants non-experts who are interested in a guideline for such data collection.
In this paper, we experiment the following Machine Learning Algorithms
Gradient Boosting
LDA
Nearest Neighbors
AdaBoost
Random forest
Linear SVM
RBF SVM
Decision Tree
Shrinkage LDA
You can see our hyperparameters choice in our code:
Our datasets can be found in this google drive TCR
More specifically, you can find out the TXT files (raw data) Here
You can get access to the CSV version of the datasets (preprocessed data)Here
EEG4Students requires the following packages to replicate our results:
Data processing and visualization:
pandas
matplotlib
numpy
Machine learning:
sklearn
We highly recommend using a conda environment for package management.
First, you should create and activate a conda environment
conda create -n EEG4Students python = 3.9
conda activate EEG4Students
You should download the sklearn, pandas, and matplotlib:
conda install scikit-learn
conda install pandas
conda install matplotlib
Most EEG recording applications have a toolset to convert the recording files to TXT or CSV files. Afterward, we can pick the subset of data we plan to use for further analysis; we recommend starting with the absolute value of the EEG signals.
Here is an example of how we preprocess the EEG signals:
- Convert .muse file to .csv files by using the default app. One example would look like this in the command line:
muse-player -f 2_66_0530.muse -C 2_66_0530.csv
Side note: This is time consuming. It takes about 10 minutes to convert 110M data.
- Use the Java code (Process1.java) to convert .csv files to .txt files. For example, you can convert the .csv files to .txt files using the absolute values of the EEG signals:
java Process1.java < Sub1_Ses1_tcr_FFT.csv > Sub1_Ses1_tcr_FFT.txt;
- As mentioned in the paper, we remove consecutive 1.4 seconds of anomaly in the datasets by executing preprocess.m (preprocess.m markes these anomalies to all 0s). You can adjust the variable "plateau_threshold" to detect noise.
Also, remember to change the subject ids from your experiments accordingly:
We provide our raw and preprocessed datasets in the google drive. You are welcome to use these datasets directly to replicate or reproduce our results.
The directory of the project should look like this:
├── Process1.java
├── TCR.py
├── Visualization
│ ├── data_raw
│ │ ├── 105_tcr_s1.txt
│ │ ├── 105_tcr_s2.txt
……
│ │ ├── 82_tcr_s5.txt
│ │ └── 82_tcr_s6.txt
│ ├── museClassify.m
│ ├── museClassifyAll.m
│ ├── musePlot.m
│ ├── noise_plot.m
│ └── overview.m
├── data_preprocess
│ └── tcr_plateau_removed_data
│ ├── tcr_subject_10_session_1.csv
│ ├── tcr_subject_10_session_2.csv
……
│ ├── tcr_subject_9_session_5.csv
│ └── tcr_subject_9_session_6.csv
├── output
│ └── tcr
│ ├── RandomForest_data_distribution.jpg
│ ├── accuracy_runtime_classifier.csv
│ ├── algorithm_comparison_each_subject.jpg
│ ├── subject_17_heatmap.jpg
│ └── subject_9_heatmap.jpg
└── preprocess.m
Remeber, create a folder called "data_preprocess" in the same level as the TCR.py. Inside the "data_preprocess", create a folder called "tcr_plateau_removed_data" that contains all the preprocessed csv version of your datasets. Lastly, create an "output" folder and a "tcr" folder inside it.
In the TCR.py, we first exclude In order to evaluate our results, just simply run
python TCR.py
Side note: the subject's ids here have limited meanings. The "subject ids" in the codes refer to the participants' position index of this list: https://github.com/GuangyaoDou/EEG4Students/blob/d1cf8c07ea57d45ba22982c22d26b67068d93740/TCR.py#L19
You can adjust the remote participants id so that the output will generate the corresponding virtual participant's heatmap: https://github.com/GuangyaoDou/EEG4Students/blob/d1cf8c07ea57d45ba22982c22d26b67068d93740/TCR.py#L22
You can visualize EEG signals from 4 receptive fields and from five different frequencies (Gamma, Beta, Alpha, Theta, and Delta) by running noise_plot.m. You need to sepcify the session number and the id number.
Also, running overview.m you can get an overview of main clustering data for the specified data file. It shows the clusters and their specificity for each of the activities, the 4 meta-clusters and their specificity, and the acccuracy of prediction over all the data.
Here is an example of executing the overview.m:
overview('2_109_0417.txt', 12)
If you have any questions, please reach out to us: guangyaodou@brandeis.edu