mne-tools/mne-icalabel

[DOC] Clarify the types of data to be used

adswa opened this issue · 2 comments

adswa commented

I am unfamiliar with the original ICLabel tool, and in reading the docs and subsequently applying mne-icalabel it to my own MEG data I got confused for which types of data it is intended to be used. The paper and introduction in the documention mention "Scalp electroencephalography (EEG) and magnetoencephalography (MEG)" explicitly, and summarize generally "mne-icalabel is a Python package for labeling independent components that stem from an Independent Component Analysis (ICA)." This made me think that I could use the tool on ICAs fitted on non-EEG data such as MEG or ieeg or ECoG as well. The warning emitted by label_components() (also mentioned in #86) however somewhat implies I would need to have EEG data:

<ipython-input-7-1dec4ffcd5a6>:1: RuntimeWarning: The provided Raw instance does not seem to be referenced to a common average reference (CAR). ICLabel was designed to classify features extracted from an EEG dataset referenced to a CAR (see the 'set_eeg_reference()' method for Raw and Epochs instances).

and ndeed I can not get the examples to run on my own data without EEG channels.
The original ICLabel paper's abstract mentions EEG data only. A comment in the tutorial seems to suggest it is even more restricted:

# Note: for this example, we are using ICLabel which has only
# been validated and works for EEG systems with less than 32 electrodes.

In response to #86 , PR #87 also introduced the following changes in the tutorial to get rid of a few warnings:

ica = ICA(n_components=15, max_iter="auto", random_state=97)
# Before fitting ICA, we will apply a common average referencing.
filt_raw = filt_raw.set_eeg_reference("average")

# Note: we will use the 'infomax' method for fitting the ICA because
# that is what is supported in the ICLabel model. In practice, one can
# use any method they choose.
ica = ICA(
    n_components=15,
    max_iter="auto",
    method="infomax",
    random_state=97,
    fit_params=dict(extended=True),
)

I think it would be useful if you could clarify the requirements mne-icalabel has for ICA data, and state more prominently which data is suitable and which data isn't suitable to help people unfamiliar with the original EEGLab ICLabel tool gauge quickly if it is useful for them. Likewise a clarification about the requirements of common average referencing and non-fastica ICA, probably even in the docstring of the function, makes sense if it is a strong requirement or recommendation, which isn't immediately obvious to me from code and documentation. The code emitting the warning reads as if bad things happen if a non-infomax method is used:

# confirm that the ICA uses an infomax extended
method_ = ica.method not in ("infomax", "picard")
extended_ = "extended" not in ica.fit_params or ica.fit_params["extended"] is False
ortho_ = "ortho" not in ica.fit_params or ica.fit_params["ortho"] is True
ortho_ = ortho_ if ica.method == "picard" else False
if any((method_, extended_, ortho_)):
warn(
f"The provided ICA instance was fitted with a '{ica.method}' algorithm. "
"ICLabel was designed with extended infomax ICA decompositions. To use the "
"extended infomax algorithm, use the 'mne.preprocessing.ICA' instance with the "
"arguments 'ICA(method='infomax', fit_params=dict(extended=True))' (scikit-learn) or "
"'ICA(method='picard', fit_params=dict(ortho=False, extended=True))' (python-picard)."
)

But the comment in the tutorial says "In practice, one can use any method they choose." If non-infomax methods are not a problem, why the warning? If non-infomax methods are not benchmarked/tested/validated, I think its worth an easily findable note on that in the docstring so that anyone can learn about this without reading the walk-throughs in detail.

Thank you for the review. PR #108 addresses this. Let us know if you think further changes are needed.

adswa commented

Looks good to me, thx!