bids-standard/pybids

Indexer - pattern matching is very brittle and does not allow sophisticated configs

Closed this issue · 0 comments

MRIQC takes an enormous amount of time to index if there is a large hidden folder or e.g. rawdata/ with dicoms.

I'm trying to set up very restrictive ignore patterns using negative lookahead and lookbehind when configuring the indexer:

            ignore_paths = [
                re.compile(r"^(?!/sub-[a-zA-Z0-9]+)"),
                # Exclude modalities and contrasts ignored by MRIQC (doesn't know how to QC)
                re.compile(
                    r"sub-[a-zA-Z0-9]+(/ses-[a-zA-Z0-9]+)?/(dwi|fmap|perf)/"
                ),
                # negative lookbehind to only index T1w, T2w and bold (please note length must be constant)
                re.compile(r"^.+(?<!(_T1w|_T2w|bold))\.(json|nii|nii\.gz)$"),
            ]

            # If participant label(s) were provided, only index those subjects (negative lookbehind)
            if participant_label:
                ignore_paths[0] = re.compile(
                    r"^(?!/sub-("
                    + "|".join(participant_label)
                    + r"))"
                )

I have a patch that addresses this issue. Will send a PR shortly.