Docker untar permissions
jordandekraker opened this issue · 2 comments
I'm reposting this from an email thread I had with Nicole Eichert and Fidel Alfaro-Almagro. @akhanf I was hoping you could take a look too?
They are running with the command
docker run -it --rm -v /home:/home \
khanlab/hippunfold:latest \
/home/dnanexus/data /home/dnanexus/ot \
participant --modality T1w --participant_label 01 --cores all
I also had them add (to Docker) -env HIPPUNFOLD_CACHE_DIR=/home/dnanexus/hippunfold_cache
with no meaningful change.
The output is:
54 of 251 steps (22%) done
[Mon Jun 3 07:15:43 2024]
rule run_inference:
input: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-preproc_T1w.nii.gz, /home/dnanexus/hippunfold_cache/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar
output: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz
log: logs/sub-01/sub-01_hemi-R_space-corobl_nnunet.txt
jobid: 91
reason: Missing output files: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz; Input files updated by another job: /home/dnanexus/hippunfold_cache/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar, work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-preproc_T1w.nii.gz
wildcards: subject=01, hemi=R
threads: 8
resources: tmpdir=/tmp, gpus=0, mem_mb=16000, mem_mib=15259, time=60
Config file config/snakebids.yml is extended by additional config specified via the command line.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Provided resources: gpus=0, mem_mb=16000, mem_mib=15259, time=60
Singularity containers: ignored
Select jobs to execute...
Changing to shadow directory: /home/dnanexus/ot/.snakemake/shadow/tmp7qzv8xg3
tar: nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/debug.json: Cannot change ownership to uid 3050834, gid 6007967: Invalid argument
...
same error for each filename inside the .tar
...
tar: Exiting with failure status due to previous errors
[Mon Jun 3 07:15:46 2024]
Error in rule run_inference:
jobid: 0
input: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-preproc_T1w.nii.gz, /home/dnanexus/hippunfold_cache/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar
output: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz
log: logs/sub-01/sub-01_hemi-R_space-corobl_nnunet.txt (check log file(s) for error details)
shell:
mkdir -p tempmodel tempimg templbl && cp work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-preproc_T1w.nii.gz tempimg/temp_0000.nii.gz && tar -xf /home/dnanexus/hippunfold_cache/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar -C tempmodel && export RESULTS_FOLDER=tempmodel && export nnUNet_n_proc_DA=8 && nnUNet_predict -i tempimg -o templbl -t Task101_hcp1200_T1w -chk model_best -tr nnUNetTrainerV2 --disable_tta &> logs/sub-01/sub-01_hemi-R_space-corobl_nnunet.txt && cp templbl/temp.nii.gz work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-06-03T071054.287943.snakemake.log
Fidel suggested that adding TAR_OPTIONS: --no-same-owner
to the docker build might help the issue because ofnother stackoverflow post, but I thought I'd see if you have any other ideas before rebuilding
I posted a fix in #291 that might apply here too - the error message seems different, but that applies the fix Fidel describes without having to re-build.
Aha I even glanced over previous issues to finda solution and missed this! thanks