/EEG_SignalsClassification

Preprocessing, analysis and classification of EEG signals into 4 classes.

Primary LanguageJupyter Notebook

EEG Signal's project

Introduction

The following experiment aims to analyze EEG signals and classify them into four classes using AI techniques. During the execution of it we used Matlab, to design the stimulis, the Emotiv EPOC headset, to register the electrodes' data and Python, for data processing. Initially the experiment was designed in different colors, then for experimentation purposes we tried it only in white. The experiment process was subdivided in four main stages for organization purposes, as shown below.

Experiment flow diagram

Types of experiment

  • Color experiment
    • Frecuency based
    • Evoked potential P300
  • Experiment in white
    • Frecuency based

Visual stimuli designing

Two experiments were designed: frecuency based and evoked potential P300, the codes were developed in Matlab 2020, using the open source library Psychtoolbox.

Frecuency based

The frecuencies used were 7hz, 9Hz, 11Hz and 13Hz, in the case of the experiment ran in color, those frecuencies were associated with the colors red, blue, green and purple, respectively, however in the second experiment all the stimuli were shown in white. Each square oscillated at a designated frecuency during 3 sec, while the others remained opaque and still, as shown in the figure. Frecuency based experiment

Evoked potential P300

The stimulis oscillated at a random frecuency, then three of the squares disappeared and just one remained on the frame, in theory this generated the P300 stimuli. P300 experiment

Data recording

For data registration the Emotiv EPOC+ headset was used, it is a 14 channel wireless EEG headset with a sampling rate of 128 Hz, that registers brain signals (in µV) through different softwares, in this project we used the EMOTIV-PRO app, it comes with several features, however we only focused on the registration, labeling and data exportation. In the app's settings, the keys associated to the labels were configured as shown in the image below.

P300 experiment

Although data from all the electrodes was collected and registered, as a first stage, we based our analysis only on the responses of the occipital area of the brain, two electrodes are placed in that zone: Occipital 1 (EEG.O1) and Occipital (EEG.O2).

Emotiv EPOC

The experiments lasted approximately 11 minutes, the records were exported as csv files:

Data pre-processing

Data categorization

Once the data was exported we focused on processing the raw data files in order to only maintain the information in which we were going to focus, those rows were: MarkerValueInt (Label), EEG.O1 and EEG.O2. The data was divided into files, using the pre-processing.ipynb file, each file represented a temporary window, which is understood as the time at which the subject was exposed to an specific stimuli, in this case each temporary window represented a 3 second recording. The files were categorized into the respective folders depending on their labels, as shown below.

P300 experiment

Signal processing

After data categorization was made, some processing took place in order to improve future data analysis, different signal processing were applied, and each modified file was saved into a different sub-category or sub-folder.

Folder organization (color experiment)

Raw Data

This folder contains the raw data, as exported from Emotiv PRO app, no processing or filtering was done to this group of data.

Raw Data Outliers

Outliers are aberrant values presented in the registered data, those values were replaced with the mean value of the respective row.

Raw Data Filtered

A Butterworth filter was applied, the frecuency range was set between 5 Hz and 30 Hz.

Raw Data Hilbert

The Hilbert transform was applied.

Raw Data Filtered & Hilbert

The Hilbert transform was applied to the filtered data.

Data augmentation

Data augmentation was applied in order to increase the amount of data. White Noise was applied to each file, with different amplitude (eg: 0.5, 1, 5, etc). In order for the classification not to be affected, the following procedure was followed:

  1. Data was divided into 70%, 15% and 15%, training, validation and testing data, respectively.
  2. Data augmentation was done only for the training data. Then, features were extracted from all the files

Feature extraction

Once all the files were sorted into their respective folders, features were extracted from all the data. Two main ways were proposed to extract them:

  1. Tsfresh
    Tsfresh is a python package which has differents methods to extract and analyze several features from a given data.

  2. Developing our own feature extraction algorithm
    We developed an algorithm to extract features of our choice:

    • Mean absolute value:
      equation1
    • Mean absolute value - type I:
      equation2
    • Mean absolute value - type II:
      equation3
    • Log Detection:
      equation4
    • Median Absolute Value:

      equation5
    • Variance:

      equation6
    • Mean Absolute Difference Value:

      equation7
    • Mean Frequency:
      equation8
    • Frecuency at maximum PSD:
    • Variance of the Central Frequency:

      equation9
    • Maximum PSD
    • Amplitude Histogram

Data Normalization

To normalize the data, first we fit the scaler to the training data, then the same scaler was used to transform validation and testing data (in order to transform val and test data with training data minimum and maximum values).

  • Training Data: Normalization was done by using preprocessing.MinMaxScaler. Specifically the fit_transform method was used, which fits the scaler to the data (saving the minimum and maximum values) an dthen transforms the data.
  • Validation and testing data: this data was normalized by using th minimum and maximum values. Specifically the transform method was used, which only transforms the data.

Classification

For experimentation purposes we tried out binary classification between all the classes: {7Hz, 9Hz, 11Hz, 13Hz, baseline}.
-XGBoost -Random Forest -Support vector machine (SVM) -K-nearest Neighbors (KNN) -Multilayer perceptron (MLP)