Mistakes when calculating CSP for train set and test set.

Question

Mistakes when calculating CSP for train set and test set.

Closed this issue 6 years ago · 2 comments

ZhangXiao96 commented 6 years ago

Hello, I think there are some mistakes in the code.

Line 43 in "run_experiment.m"
The train set and the test set are all filtered by the csp_mat of train set, which will cause mistakes in ten-fold validation.
Line 42 in 'eval_feats.m'
10-fold validation was both done on the "train set" and the "test set", respectively, to get two different results. However, actually this validation didn't match the processing I mentioned in 1. When validating on the "train set", the label information has already been leaked, since we used all the labels in CSP matrix calculation.

Answer 1 · 2019-02-28T13:52:42.000Z

Thank you for the remark.

The statement that training/testing sets are both filtered using CSP matrix generated from the training set is correct. Will it cause mistakes in 10-fold cross validation? Depends on how you implement it.

For the filtering, I don't think any other way is possible. In a real world BCI scenario you do not know the true classes of the input (testing) data you are classifying. The only option is to learn the CSP matrix from training data and use it to filter testing data. See Fig. 3 of https://doi.org/10.1186/s12984-017-0276-4

To be absolutely correct the CSP matrix of training set during 10-fold cross-validation should be rebuilt after removing the data/labels which will be used for cross-validation. Currently, this was not taken into account. I don't think that the training cross-validation result will be so much different as validation data/labels take only 10% of the total training data/labels. The covariance matrices would be built from 90% of the data which in most of the cases fully covers the input data. However, this issue does not affect the testing cross-validation results. Feel free to implement it correctly.

Answer 2 · 2019-03-02T01:52:56.000Z

Thanks for your reply!