Indexer - pattern matching is very brittle and does not allow sophisticated configs
Closed this issue · 0 comments
oesteban commented
MRIQC takes an enormous amount of time to index if there is a large hidden folder or e.g. rawdata/
with dicoms.
I'm trying to set up very restrictive ignore
patterns using negative lookahead and lookbehind when configuring the indexer:
ignore_paths = [
re.compile(r"^(?!/sub-[a-zA-Z0-9]+)"),
# Exclude modalities and contrasts ignored by MRIQC (doesn't know how to QC)
re.compile(
r"sub-[a-zA-Z0-9]+(/ses-[a-zA-Z0-9]+)?/(dwi|fmap|perf)/"
),
# negative lookbehind to only index T1w, T2w and bold (please note length must be constant)
re.compile(r"^.+(?<!(_T1w|_T2w|bold))\.(json|nii|nii\.gz)$"),
]
# If participant label(s) were provided, only index those subjects (negative lookbehind)
if participant_label:
ignore_paths[0] = re.compile(
r"^(?!/sub-("
+ "|".join(participant_label)
+ r"))"
)
I have a patch that addresses this issue. Will send a PR shortly.