aramis-lab/clinica

Use regex in `_convert_subject_to_rid()`

Closed this issue · 0 comments

The small function _convert_subject_to_rid() is not as robust as we'd hope since it implicitly assumes a string in the format "XXX_S_XXXX" and should return the last 4 digits:

def _convert_subject_to_rid(subject: str) -> int:
"""Get the QC RID from the subject string identifier.
TODO: Use a regex to match pattern XXX_S_XXXX ????
Examples
--------
>>> _convert_subject_to_rid("123_S_4567")
4567
"""
try:
return int(subject[-4:])
except Exception:
raise ValueError(
f"Cannot convert the subject '{subject}' identifier into a RID "
"for PET QC filtering. The expected format for the subject is XXX_S_XXXX."
)

Returning the slice [-4:] works as long as the input string is correctly formatted but could return crazy results if badly formatted. Currently, there is no guarantee that this will be the case as the subject input string comes from the subjects list, which is simply extracted from a CSV file:

if subjects is None:
adni_merge = load_clinical_csv(csv_dir, "ADNIMERGE")
subjects = list(adni_merge.PTID.unique())

Using a regex like mentioned in the TODO could improve the function robustness a bit.