ReproNim/reproin

Document/establish "ultimate" YODA setup

yarikoptic opened this issue · 0 comments

Similarly to https://github.com/ReproNim/containers/#a-typical-workflow but for reproin. An example dataset which uses such setup is http://datasets.datalad.org/?dir=/dbic/QA (AKA ///dbic/QA) where you can see https://datasets.datalad.org/dbic/QA/.datalad/config containing

[datalad "containers.repronim-reproin"]
	image = code/containers/images/repronim/repronim-reproin--0.13.1.sing
	cmdexec = {img_dspath}/code/containers/scripts/singularity_cmd run -B /inbox/DICOM/ -B /inbox/BIDS {img} {cmd}
[datalad "containers.repronim-reproin-dev"]
	image = code/containers/images/repronim/repronim-reproin--0.13.1.sing
	cmdexec = {img_dspath}/code/containers/scripts/singularity_cmd run -B {img_dspath}/code/reproin/bin/reproin:/usr/local/bin/reproin {img} {cmd}

where both point to the subdataset .code/containers which is listed in https://datasets.datalad.org/dbic/QA/.gitmodules as

[submodule "code/containers"]
	path = code/containers
	url = https://datasets.datalad.org/repronim/containers/.git
	datalad-id = b02e63c2-62c1-11e9-82b0-52540040489c
	datalad-url = ///repronim/containers

and the 2nd one (repronim-reproin-dev) provides demonstration on how to reuse the same container but with an overloaded reproin command e.g. during active development of a new feature -- so there is no need to create a new container just to try a new version of the reproin script -- code/reproin is also just a git submodule pointing to this repo.

Then in such a dataset, upon getting some data to convert I just need to run

datalad containers-run -n repronim-reproin study-convert dbic/QA

(note the "cons" of needing to specify the study although it should be known since we are in that location... RFing of the interface should take that into account).

Here is a bash dump of commands for a recent similar setup with an "outdated" reproin since that is what study used already. It also includes TODO comments . Relevant issue on datalad side: datalad/datalad#5950 on how to configure ephemeral clone (I should have cloned locally ephemeral and then adjusted URL in .gitmodules):

cd /inbox/BIDS/Wager/Wager/1090_tdcs/
# TODO: should have instead done ephemeral clone to not later carry a copy of images
datalad install -d . --source ///repronim/containers code/containers
cd code/containers
git remote add --fetch local ~/repronim-containers
# check which dcm2niix was used, we only recently started to add heudiconv version in
git grep Software
grep v1.0.20211006 ~/reproin-dcm2niix-versions.tsv 
# unfortunately freeze_version does not work when we are not under that dataset. TODO
cd code/containers/
scripts/freeze_versions --save-dataset=$PWD/../.. repronim-reproin=0.11.3
datalad get images/repronim/*0.11.3*

cd -
# freeze_versions didn't save actually !
datalad save -d^ -m "Added definition of containers.repronim-reproin frozen to 0.11.3. updateurl is wrong" .datalad/config

the problem comes that it is only recently that we started to copy reproin script inside the container:

❯ git describe --contains db936b725239ecfec2e45127b8fd0d20a413eb97
0.13.1~3

so we do not have it inside... (yet to decide if that was a good decision anyways). So we will follow the setup above to include another submodule and bind mount reproin inside and use it. We also adding bind mounts -B /inbox/DICOM/ -B /inbox/BIDS (a little of non-YODA in that ...)...

datalad run -m "Make reproin container use bound mounted reproin script" git config -f .datalad/config datalad.containers.repronim-reproin.cmdexec '{{img_dspath}}/code/containers/scripts/singularity_cmd exec -B /inbox/DICOM/ -B /inbox/BIDS  -B {{img_dspath}}/code/reproin/bin/reproin:/usr/local/bin/reproin {{img}} /usr/local/bin/reproin {{cmd}}'

this setup is also suboptimal since we are sandwiching datalad (and git-annex) outside and inside the container. It has shown to be fragile a number of times! So most likely reproin script should be made to be the "driver" outside, so installed "environment wide" and then it should use containers-run thus getting per conversion "datalad run" prov record.