Question about data preprocess(standard scale)
Closed this issue · 3 comments
hi @martinwimpff , thank you for your work and code.
While studying your and others models, I noticed the way of preprocessing datasets (StandardScaler
method):
channel-attention/channel_attention/datamodules/base.py
Lines 39 to 44 in bcd92cc
also you can find this scale method in other models:
From my understanding, this method(StandardScaler
: Doc) normalizes the columns of a 2D matrix, it centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set.
But for EEG dataset (shape: [trials, times
] for each channel), it calculates the mean
and std
of all signal of trials at time sample t
and standardizes it. However, this will change the time sequence of one trial , and maybe loss of time domain information?(because this changes the signal at t
only according other trials).
So I can not understand this standard method, because the EEG time sequence is not like the multi-channel features.
Thank you again.😊
Hi @edw4rdyao,
thanks for your comment.
Yes, the StandardScaler
calculates a mean
and std
per time point.
This will change the trials. However, the "relative" information stays the same and only some "absolute" information is lost. How serious this information loss is, depends on the concrete dataset and task.
We chose to use this scaling because, as you mentioned, it is used in other publications.
In our latest experiments (https://github.com/martinwimpff/eeg-otta) we do not use any scaling as we found out that the effect is minor or even harmful in some cases.
Best,
Martin
Hi @martinwimpff ,
Sorry to trouble you and thank you for your clarification😐.I will study your latest paper and repo!
In addition, I would like to ask if you think there is any general standardization method for EEG data? such as standardizing the data of each trial in each channel, or directly throwing the real voltage value directly into the neural network?(of course, I will also do experimental verification, but I think this is a question may worth discussing.)
Now there is no question about your repo, I will just close this issue.
Best,
Edward
Hi @edw4rdyao
as far as I know there is no general method. You can check this paper for a small investigation on that topic.
Best,
Martin