[BUG] Overlapping 'bad' annotations cause wrong BaseCluster._segment_raw() behaviour

The reject_by_annotations argument is not working properly in BaseCluster._segment_raw()

import os
import numpy as np
import mne

sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = os.path.join(sample_data_folder, 'MEG', 'sample',
                                    'sample_audvis_raw.fif')
raw = mne.io.read_raw_fif(sample_data_raw_file)
raw.set_annotations(None)
    
annotation_0 = mne.Annotations(onset=100, duration=100, description='bad')
annotation_1 =  mne.Annotations(onset=130, duration=30, description='bad')

raw_01 = raw.copy()
raw_01.set_annotations( annotation_0 + annotation_1) #overlapping 'bad' annotations

We apply the logic from _segment_raw() found here

onsets, ends = _annotations_starts_stops(raw_01, ["BAD"])
onsets = onsets.tolist() + [raw_01.get_data().shape[-1] - 1]
ends = [0] + ends.tolist()

for onset, end in zip(onsets, ends):
    print(end, onset)

output:

0 60061
120123 78080
96098 166799

The segments are completely messed up due to the overlapping 'bad' annotations.
The expect behaviour would be :

0 60061 # start of recording - start of first 'bad' annotation
120123 166799 # end of first 'bad' annotation - end of recording

No sure if we should fix the in Pycrostates or if it should be fix in _annotations_starts_stops in MNE

mne.__version__
1.1.0

Ok, yes I can see it here... we should retrieve the onsets and ends per annotation instead of once for all annotations.

I'll check _annotations_starts_stops to figure out if it should be fixed in MNE.

Looks like we can simply use invert=True

onsets, ends = _annotations_starts_stops(raw_01, ["BAD"], invert=True)

for onset, end in zip(onsets, ends):
    print(onset, end)

output:

0 60061
120123 166800

Haven't looked in detail to your example yet, but are you sure?

import numpy as np

from mne import Annotations, create_info
from mne.annotations import _annotations_starts_stops
from mne.io import RawArray

data = np.random.randn(1, 10)
info = create_info(["EEG 001"], 1, "eeg")
raw = RawArray(data, info)

onset = [1, 2]
durations = [7, 2]
annotations = Annotations(onset, durations, "bads")
raw.set_annotations(annotations)

onsets, ends = _annotations_starts_stops(raw, "bads")

I'm getting:

onsets
Out[22]: array([1, 2])

ends
Out[23]: array([8, 4])

which looks correct?

It looks to me like

pycrostates/pycrostates/cluster/_base.py

Lines 814 to 815 in 1cb8d0c

    
           onsets = onsets.tolist() + [data.shape[-1] - 1] 
        
           ends = [0] + ends.tolist()

is the issue. But I'll have to dig in more to remember what it does and to figure out what is happening here.

Look also at this method in raw, maybe it can be useful https://github.com/mne-tools/mne-python/blob/880e883c06184160c30d50da06803e67977ac366/mne/io/base.py#L431-L474

It's what is used by reject_by_annotations to create Epochs.

	onsets = onsets.tolist() + [data.shape[-1] - 1]
	ends = [0] + ends.tolist()