aramis-lab/clinica

Ignore hidden CSV files in `load_clinical_csv`

Opened this issue · 1 comments

The function load_clinical_csv from the ADNI-to-BIDS converter is using regex to find and open a clinical CSV file, no matter the date that ADNI appends at the end of the file name.

def load_clinical_csv(clinical_dir: str, filename: str) -> pd.DataFrame:
"""Load the clinical csv from ADNI. This function is able to find the csv in the
different known format available, the old format with just the name, and the new
format with the name and the date of download.
Parameters
----------
clinical_dir: str
Directory containing the csv.
filename: str
name of the file without the suffix.
Returns
-------
pd.DataFrame:
Dataframe corresponding to the filename.
"""
import re
from pathlib import Path
pattern = filename + r"(_\d{1,2}[A-Za-z]{3}\d{4})?.csv"
files_matching_pattern = [
f for f in Path(clinical_dir).rglob("*.csv") if re.search(pattern, (f.name))
]
if len(files_matching_pattern) != 1:
raise IOError(
f"Expecting to find exactly one file in folder {clinical_dir} "
f"matching pattern {pattern}. {len(files_matching_pattern)} "
f"files were found instead : \n{'- '.join(str(files_matching_pattern))}"
)
try:
return pd.read_csv(files_matching_pattern[0], sep=",", low_memory=False)
except Exception:
raise ValueError(
f"File {str(files_matching_pattern[0])} was found but could not "
"be loaded as a DataFrame. Please check your data."
)

When there are multiple such files, clinica raises an error which is desired since we shouldn't make a choice for the user.

However, it was reported recently by @HuguesRoy that under MacOS, it can happen that a hidden file is created when the CSV file is opened in a GUI application like Excel. In this situation, the load_clinical_csv function crashes by finding two files, the correct one and the temporary hidden file.

Even though this should not happen a lot in practice, making this function a bit more robust by ignoring hidden files should be pretty straightforward.

This issue is considered stale because it has not received further activity for the last 14 days. You may remove the inactive label or add a comment, otherwise it will be closed after the next 14 days.