Mind flaying aka MVPA `Multi Variate Pattern Analysis`

Key notes

We are interested in decoding over time from epochs x channels x time points EEG data
This strategy consists in fitting a multivariate predictive model on each time instant and evaluating its performance at the same instant on new epochs
X is the epochs data of shape n_epochs × n_channels × n_times As the last dimension of X is the time, an estimator will be fit on every time instant
We can retrieve the spatial filters and spatial patterns if we explicitly use a LinearModel
get_coef function can fetch the patterns. Make sure we do inverse_transform=True

Extraction filters of backward models may exhibit large weights at channels not at all picking up the signals of interest, as well as small weights at channels containing the signal

Such “misleading” weights are by no means indications of suboptimal model estimation

Rather, they are needed to “filter away” noise and thereby to extract the signal with high SNR

We derived a transformation by which extraction filters of any linear backward model can be turned into activation patterns of a corresponding forward model. By this means, backward models can eventually be made interpretable

Temporal generalisation - an extension of the decoding over time approach

Evaluating whether the model estimated at a particular time instant accurately predicts any other time instant
It is analogous to transferring a trained model to a distinct learning problem, where the problems correspond to decoding the patterns of brain activity recorded at distinct time instants
The diagonal line in the plot below is exactly same as the time-by-time decoding plot

Trivia

Difference between scikit-learn and mne.decoding.Scaler

scikit-learn scales each classification feature (each time point across channels) with mean and standard deviation computed across epochs

mne.decoding.Scaler scales each channel using mean and standard deviation computed across all of its time points and epochs

Vectorizer scikit-learn transformers and estimators generally expect 2D data (n_samples * n_features), whereas MNE transformers typically output data with a higher dimensionality (e.g. n_samples * n_channels * n_frequencies * n_times)

A Vectorizer therefore needs to be applied between the MNE and the scikit-learn steps

Solvers Lookup the LogisticRegression solvers documentation. liblinear is faster than lbfgs sag and saga are faster for larger datasets For multiclass problems, only newton-cg, sag, saga and lbfgs handle multinomial loss liblinear is limited to one-versus-rest schemes
Logistic regression python solvers' definitions
Crossvalidation

On the interpretation of weight vectors of linear models in multivariate neuroimaging is an awesome paper which explains why patterns instead of filters in get_coef function makes more sense
Normally when an optimization algorithm does not converge, it is usually because training data is not normalised
Set max_iter to a larger value

Points to ponder

Shuffling the dataset
StratifiedKFold
Using features instead of time series
I think the limitations are from the kind of features that can be computed on a single time point!
F1 scores vs roc_auc
About balancing the classes

Resources and Inspirations

Motivation for TFCE

Threshold might miss broad but weak clusters, and focus only on strong but peaky clusters
The intuition of TFCE is that we are going to try out all possible thresholds and see whether a given time-point belongs to a significant cluster under any of our set of cluster-thresholds
TFCE will be a weighted average between the cluster extend and cluster height
i.e how many extended samples and how large the t value is / the evidence for an effect TFCE schmeatic, adapted from Benedikt V. Ehinger (used under CC-BY)

Result images looks like below

spatio_temporal_cluster_test in `mne-python`

The data X should be of the form observations x time x channels
All dimensions except the first (no of observations/subjects) should match across all groups
The threshold has to be a dictionary for TFCE method
tfce = dict(start=.4, step=.4) Ideal is start = 0 and step a minimal value. BUT, this is going to take for ever
Other end, is start=0 and step = np.inf
Read this discussion to understand the accuracy-speed tradeoffs
In short, tfce parameters are data dependent. The tricky thing is to figure out the speed/accuracy tradeoff and how to make it. See the quote below

Yes but this is not the only consideration. There is a speed/accuracy tradeoff, and it's not obvious how to make it. To make things completely reliable/accurate we could set the start to zero and the step to be some tiny number, but this operation will essentially run forever. The other extreme is essentially what we do now, which is probably equivalent (I'd need to think about it) to using TFCE with start=threshold, step=np.inf, which is guaranteed to complete in a small amount of time (one clustering iter) but not be so accurate. So in this sense, what we do now is well defined essentially as the "guaranteed fast but not so great approximation". In between these two there are approximations that probably make good tradeoffs between accuracy and speed. Robustly choosing a good one is not necessarily an easy problem, though, because it depends on the data. Maybe using something like starting at the 10th percentile and going in increments relative to the difference between the 90th and 10th percentiles would work in 90% of cases...? If you are motivated to extensively test these tradeoffs and can find one that works in such a high percentage of cases, then it could make sense to use as an automated threshold='tfce'. As for changing this to the default value for threshold, thinking about it more I'm actually not opposed to it. I imagine the vast majority of people override the threshold value already in their analyses. Changing it to tfce will make things take a lot longer but should be more accurate.

To understand what the heck is TFCE read up the awesome blogposts from Benedikt Ehinger's blog (checkout resources)
tail 0 is two tailed test
provide adjacency info so that TFCE knows the neighboring channels
t_power will count locations or weigh each location by its statistical score

Detailed drill down

cluster_level script has all the functions
Goal is to track the data X which is a list of length two (groups/conditions), the parameters (for tfce, start and step) and the output (T_obs, clusters, cluster_p_values, H0). Understanding how these are computed and intuitions

Tumbling down the rabbit hole

We start with defning a dictionary called tfce with start and step values. Say 0 and 0.2
A range of thresholds are genearted using the formula thresholds = np.arange(tfce['start'], stop, tfce['step'], float)
stop is computed by stop = max(np.max(use_x), -np.min(use_x))
The function prints stat_fun(H1) = min= .... max=.....
These min and max comes from np.min(t_obs) and np.max(t_obs)

Resources

Read this post before writing TFCE results

All-resolutions Inference for M/EEG in Python

Readme

rahulvenugopal/Learn_NeuralDecoding_for_EEG

Mind flaying aka MVPA Multi Variate Pattern Analysis