nipreps/niworkflows

Pre-run freesurfer folder needs to be writable ?

Opened this issue · 9 comments

We are trying to re-run a dataset previously processed with fmriprep 20.2.0, now with 20.2.7, re-using the freesurfer runs from 20.2.0 . The 20.2.0 runs exist on a read-only folder. It seems that fmriprep does recognize the pre-existing freesurfer run, but still throws errors such as below, and doesn't produce different template space outputs, nor any anatomical reports. When I copy the freesurfer folder to a writable folder, it works as expected. It seems that it doesn't really modify or create any new files in the pre-existing, yet doesn't like the read-only folder.

Traceback (most recent call last): File "/usr/local/miniconda/lib/python3.7/pathlib.py", line 1241, in mkdir self._accessor.mkdir(self, mode) FileNotFoundError: [Errno 2] No such file or directory: '/DATA/foo/foobar'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 398, in run
runtime = self._run_interface(runtime)
File "/usr/local/miniconda/lib/python3.7/site-packages/niworkflows/interfaces/bids.py", line 837, in _run_interface
subjects_dir.mkdir(parents=True, exist_ok=True)
File "/usr/local/miniconda/lib/python3.7/pathlib.py", line 1245, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/usr/local/miniconda/lib/python3.7/pathlib.py", line 1245, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/usr/local/miniconda/lib/python3.7/pathlib.py", line 1245, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
[Previous line repeated 2 more times]
File "/usr/local/miniconda/lib/python3.7/pathlib.py", line 1241, in mkdir
self._accessor.mkdir(self, mode)
OSError: [Errno 30] Read-only file system: '/DATA/foo'

Hm. I'm surprised that exist_ok doesn't cover this case. The failure is clearly in niworkflows, so I'll transfer over there...

I have since done a couple more experiments.

  1. When the top level freesurfer folder is writable, but the sub-XXXX, fsaverage and fsaverage5 (we want results in those spaces) subfolders are read-only and local copies (not symlinks), it works fine.
  2. However, when the top level freesurfer folder is writable, but the sub-XXXX, fsaverage and fsaverage5 (we want results in those spaces) subfolders are symlinks to read-only datalad repository locations, it fails, although I get a different error as below where it seems like it is trying to run autorecon:

=============
Node: fmriprep_wf.single_subject_XXXX_wf.anat_preproc_wf.surface_recon_wf.autorecon1

Traceback (most recent call last):
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
result["result"] = node.run(updatehash=updatehash)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 521, in run
result = self._run_interface(execute=True)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 639, in _run_interface
return self._run_command(execute)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 751, in _run_command
f"Exception raised while executing Node {self.name}.\n\n{result.runtime.traceback}"
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node autorecon1.

RuntimeError: subprocess exited with code 1.

Ah. I have had issues with freesurfer and datalad. I've found datalad unlock generally makes everything better.

We could clone the existing run, but the purpose of using links to the datalad repository (which is read-only) is to avoid duplicating all that disk space while we process the large dataset. If we datalad unlock on the clone, that means yet another copy :-( .

Datalad unlock should not make copies. It performs git-link trickery. Have you found that doing so increases your disk usage?

I have been told that it not only unlinks the files but the files in git annex remain, doubling disk space. I just tested and indeed unlock takes the disk usage for my test freesurfer folder from 452M to 927M.

Interesting. When I unlock, the files become git links. I haven't tested whether that makes it appear doubled to du. What about df? Do you see a 450MB difference there?

Yes, it does seem like df sees the difference as well. Below, there is a ~485000K difference.

$ df .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdd 38910016440 23133702732 13823016508 63% /DATA
$ datalad unlock .
unlock(ok): label/BA_exvivo.ctab (file)
unlock(ok): label/BA_exvivo.thresh.ctab (file)
unlock(ok): label/aparc.annot.DKTatlas.ctab (file)
unlock(ok): label/aparc.annot.a2009s.ctab (file)
unlock(ok): label/aparc.annot.ctab (file)
unlock(ok): label/lh.BA1_exvivo.label (file)
unlock(ok): label/lh.BA1_exvivo.thresh.label (file)
unlock(ok): label/lh.BA2_exvivo.label (file)
unlock(ok): label/lh.BA2_exvivo.thresh.label (file)
unlock(ok): label/lh.BA3a_exvivo.label (file)
[316 similar messages have been suppressed; disable with datalad.ui.suppress-similar-results=off]
action summary:
unlock (ok: 326)
$ df .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdd 38910016440 23134187776 13822531464 63% /DATA

Hmm. Okay. It looks like you can have it use hard-links instead of making copies by setting the config option annex.thin=true. This only works if you're comfortable with throwing away the FS directory because you have another copy somewhere else, as it does present the possibility of corrupting the annexed files.