khanlab/hippunfold

Docker untar permissions

jordandekraker opened this issue · 2 comments

I'm reposting this from an email thread I had with Nicole Eichert and Fidel Alfaro-Almagro. @akhanf I was hoping you could take a look too?

They are running with the command

docker run -it --rm -v /home:/home \
 khanlab/hippunfold:latest \
 /home/dnanexus/data /home/dnanexus/ot \
 participant --modality T1w --participant_label 01 --cores all

I also had them add (to Docker) -env HIPPUNFOLD_CACHE_DIR=/home/dnanexus/hippunfold_cache with no meaningful change.

The output is:

54 of 251 steps (22%) done

[Mon Jun  3 07:15:43 2024]
rule run_inference:
    input: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-preproc_T1w.nii.gz, /home/dnanexus/hippunfold_cache/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar
    output: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz
    log: logs/sub-01/sub-01_hemi-R_space-corobl_nnunet.txt
    jobid: 91
    reason: Missing output files: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz; Input files updated by another job: /home/dnanexus/hippunfold_cache/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar, work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-preproc_T1w.nii.gz
    wildcards: subject=01, hemi=R
    threads: 8
    resources: tmpdir=/tmp, gpus=0, mem_mb=16000, mem_mib=15259, time=60

Config file config/snakebids.yml is extended by additional config specified via the command line.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Provided resources: gpus=0, mem_mb=16000, mem_mib=15259, time=60
Singularity containers: ignored
Select jobs to execute...
Changing to shadow directory: /home/dnanexus/ot/.snakemake/shadow/tmp7qzv8xg3
tar: nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/debug.json: Cannot change ownership to uid 3050834, gid 6007967: Invalid argument
...
same error for each filename inside the .tar
...
tar: Exiting with failure status due to previous errors
[Mon Jun  3 07:15:46 2024]
Error in rule run_inference:
    jobid: 0
    input: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-preproc_T1w.nii.gz, /home/dnanexus/hippunfold_cache/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar
    output: work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz
    log: logs/sub-01/sub-01_hemi-R_space-corobl_nnunet.txt (check log file(s) for error details)
    shell:
        mkdir -p tempmodel tempimg templbl && cp work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-preproc_T1w.nii.gz tempimg/temp_0000.nii.gz && tar -xf /home/dnanexus/hippunfold_cache/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar -C tempmodel && export RESULTS_FOLDER=tempmodel && export nnUNet_n_proc_DA=8 && nnUNet_predict -i tempimg -o templbl -t Task101_hcp1200_T1w -chk model_best -tr nnUNetTrainerV2 --disable_tta &> logs/sub-01/sub-01_hemi-R_space-corobl_nnunet.txt && cp templbl/temp.nii.gz work/sub-01/anat/sub-01_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-06-03T071054.287943.snakemake.log

Fidel suggested that adding TAR_OPTIONS: --no-same-owner to the docker build might help the issue because ofnother stackoverflow post, but I thought I'd see if you have any other ideas before rebuilding

I posted a fix in #291 that might apply here too - the error message seems different, but that applies the fix Fidel describes without having to re-build.

Aha I even glanced over previous issues to finda solution and missed this! thanks