nbara/python-meegkit

TRCA has Low Accuracy for Identifying 8 Targets in 16-23 Frequency Range?

nel-hidalgo opened this issue · 8 comments

I attempted to use the TRCA algorithm with SSVEP data corresponding to 8 targets, flickering at frequencies between 16 to 23 Hz with 1 Hz difference between each stimulus and without any phase offset. Is it appropriate to use meegkit's TRCA implementation in this situation where there is less targets and no phase offset? I used the same filter bank presented in https://nbara.github.io/python-meegkit/auto_examples/example_trca.html, but the accuracy on my data (which I have ensured is in the correct format and filtered between 15 to 50 Hz before passing it through the filter banks) stays below 25% when using TRCA. I have in total 128 labels for a total of 16 experiment blocks, where each time the TRCA training gets 15 blocks (120 labels) and needs to predict 8 labels.
When I use CCA (from sci-kit learn implementation) on the same data, I get 87 % accuracy on average per experiment block. The fact that CCA works well with data that is formatted exactly as the data I am passing into TRCA tells me that the problem is not in the data formatting. I would appreciate any help figuring out how to get TRCA working, if that is possible, in the given scenario.

nbara commented

Hi @nel-hidalgo , if you are 100% sure the data is formatted correctly, it will be very difficult for me to help you debug this without having some data to play with...

Here is a drive link to the data. It is stored in a .mat file as "eeg" and it is formatted as [samples,channels,trials] and the labels are under "labels" and the labels are the indices that correspond to the array of frequencies [16,17,18,19,20,21,22,23], so 0 is 16 Hz and so on.
https://drive.google.com/file/d/1q0ki2jBSpryccjxATdKBiV1euXn1sLQW/view?usp=sharing

nbara commented

Thanks, downloaded. I will try to give it a shot this week.

Can you also give me the sampling frequency?

nbara commented

I've had a look and I can reproduce the issue. Can you confirm the sampling frequency please?

The sampling frequency is 250 Hz.

nbara commented

Thanks. I had a bit of time to investigate, and I am now reasonably certain that the issue is not in the TRCA code, but in your data (or rather, that TRCA may not be suited to your data).

Let me illustrate that with a graph :

image

import matplotlib.pyplot as plt
import numpy as np
import scipy.io
from meegkit.utils import unfold
from scipy.signal import welch

###############################################################################
# Parameters
# -----------------------------------------------------------------------------
sfreq = 250  # sampling rate [Hz]
list_freqs = np.array([16, 17, 18, 19, 20, 21, 22, 23])  # stimulus freqs
n_targets = list_freqs.size  # number of conditions

###############################################################################
# Load data
# -----------------------------------------------------------------------------
path = os.path.join('.', 'tests', 'data', 'eeg_for_trca.mat')
eeg = scipy.io.loadmat(path)["eeg"]
labels = scipy.io.loadmat(path)["labels"].flatten()
n_samples, n_chans, n_trials = eeg.shape

f, ax = plt.subplots(2, 8, figsize=(15, 6), sharex=True, sharey=True)
for i in range(8):
    freqs, psd = welch(unfold(eeg[..., labels == i]), fs=sfreq,
                       nfft=500, nperseg=500, noverlap=0, axis=0)
    ax[0, i].axvline(list_freqs[i], ls=':', color='k')
    ax[0, i].plot(freqs, psd)
    ax[0, i].set_xlim([10, 35])
    ax[0, i].set_title(f'{list_freqs[i]} Hz')

    freqs, psd = welch(eeg[..., labels == i].mean(-1), fs=sfreq,
                       nfft=500, nperseg=500, noverlap=0, axis=0)
    ax[1, i].axvline(list_freqs[i], ls=':', color='k')
    ax[1, i].plot(freqs, psd)

    if i > 0:
        ax[0, i].set_yticklabels([])
        ax[1, i].set_yticklabels([])

ax[0, 0].set_ylabel('PSD of concatenated data')
ax[1, 0].set_ylabel('PSD averaged over trials')
ax[1, 0].set_xlabel('Frequencies')
plt.tight_layout()
plt.show()

Here I'm showing the PSD for each condition (=target frequency). The top row is the PSD computed on the concatenated trials for each condition (average PSD), and the bottom row is the PSD of the trial-average.

The first problem that I can think of is that the individual peak frequencies are much weaker in the bottom row, which seems to indicate that there is a phase mismatch between trials. This is problematic because the TRCA method relies on computing an "ERP template" over trials, and here you are shooting yourself in the foot. This might also explain why a basic CCA worked better for you.

A second problem is that while you claim to have filtered your data between 15 and 50hz, the actual passband seems to be closer to [15-30hz] on the graphs above. TRCA relies heavily on the filterbank decomposition, which allows it to leverage higherr frequency information in the harmonics. Here the upper bound of your passband is so low that you are effectively removing even the first harmonic from your data..

Hope that helps.

That's very helpful. Thank you! To confirm whether this is a data filtering issue or not, I was wondering if you could run through your pipeline the raw data file, which I am attaching, and you may apply the filtering that works best with your TRCA implementation. Also, what exactly do you mean by phase mismatch? Thanks, again. https://drive.google.com/file/d/1qUs307-aaqbGNSoaHnWvQHc0nh9Ej1FY/view?usp=sharing

nbara commented

Also, what exactly do you mean by phase mismatch?

I meant that the stimulus phase seems to be a bit jittered between trials, which would result in a somewhat weaker ERP response when averaging over trials