[REQUEST]: allow non-conda installation of dependencies
Closed this issue · 10 comments
Description
I have been trying to get phg to work on my system as a singularity image, but I have been facing an issue with that: the code assumes that the user always has a conda environment which contains the dependencies. Since a singularity image (or docker container for that matter) can have all dependencies inside without the need for a conda environment that is needed, is it possible to have a --no-conda
option or something similar which assumes all dependencies are available on the ${PATH}
?
For testing, I have written this singularity .def
file (for version 2.3):
Bootstrap: docker
From: mambaorg/micromamba:1.5.8
%post
apt-get update && apt-get install -y wget
%post
micromamba install -y -n base -c conda-forge -c bioconda -c tiledb python=3.8.15 tiledb-py=0.22.3 tiledbvcf-py=0.25.3 anchorwave=1.2.2 bcftools=1.16 samtools=1.16.1 agc=3.0 openjdk=17.0.10
micromamba clean --all --yes
%post
mkdir -p /opt
cd /opt
wget https://github.com/maize-genetics/phg_v2/releases/download/2.3.16.153/PHGv2-v2.3.tar
tar xvf PHGv2-v2.3.tar
rm PHGv2-v2.3.tar
%environment
export PATH=/opt/phg/bin:$PATH
export JAVA_OPTS="-Xmx50g"
Alternatives
No response
Additional Context
No response
This is something we can consider. Is there a reason you prefer docker vs conda?
I prefer having all in one single (singularity) container so it's easy to incorporate in my pipelines. I prefer to run most tools in a Snakemake pipeline myself so it's reproducible later on. I think removing the need for having a specific conda environment name could solve this :)
I tried to implement it myself in a PR but I cannot get the tests to run successfully (also not without any changes, there seems to be the assumption that I have a full TileDB available at $HOME/temp/phgv2Tests/tempDir/testTileDBURI/
, which I don't.
If you prefer to implement it yourself, no worries! Just thought I could give it a go!
We have created a card to consider this request. If implemented, it may not be via a parameter, but based on other internal changes to the code. One of our goals is to keep parameters to a minimum. We find an abundance of parameters results in an interface that is confusing to users. At the moment we have higher priorities so I cannot predict when we will address it.
In the meantime, one option for you is to take our phg_environment.yml file and create a "phgv2-conda" conda environment inside your docker. You would not need to run the environment, just create it. If you decide to try this, please let us know how it works. We appreciate your feedback!
The command to run inside your docker would be:
conda env create --solver=libmamba --file src/main/resources/phg_environment.yml
(replace "src/main/resources/phg_environment.yml" with the path to your copy of the phg_environment.yml file)
After playing around with your suggestion and some other ideas, I have created a working version. It basically creates a script called conda
which checks if phg
is wanting to run something in your default environment name and if so, it removes the conda run -n phgv2-conda
from the command.
The singularity .def
file (works with singularity v3.9; haven't tested other versions):
Bootstrap: docker
From: mambaorg/micromamba:1.5.8
%post
apt-get update && apt-get install -y wget
%post
mkdir -p /opt
cd /opt
wget https://github.com/maize-genetics/phg_v2/releases/download/2.3.16.153/PHGv2-v2.3.tar
tar xvf PHGv2-v2.3.tar
rm PHGv2-v2.3.tar
%post
micromamba install -y -n base -c conda-forge -c bioconda -c tiledb python=3.8.15 tiledb-py=0.22.3 tiledbvcf-py=0.25.3 anchorwave=1.2.2 bcftools=1.16 samtools=1.16.1 agc=3.0 openjdk=17.0.10
micromamba clean --all --yes
%post
cat << 'EOF' > /usr/local/bin/conda
#!/bin/bash
if [[ "$1" == "run" && ("$2" == "-n" || $2 == "--name") && "$3" == "phgv2-conda" ]]; then
shift 3
exec micromamba run --name base "$@"
else
echo "conda is not installed; use micromamba instead"
exit 1
fi
EOF
chmod +x /usr/local/bin/conda
%environment
export PATH=/usr/local/bin:/opt/phg/bin:/opt/conda/bin:$PATH
export JAVA_OPTS="-Xmx50g"
%runscript
echo "Running: $*"
exec "$@"
You may close this issue as this solves it for me and you indicated such a workaround is preferred for now.
I'm glad you found a solution that works for you. Keep in mind you need to be sure the phgv2 required programs you load from conda must have tags that match the release of the phgv2 version you are pulling. Otherwise there will be errors in execution.
Yes I will! That's also why I have the version of phgv2 hardcoded. But as far as I'm aware the YAML file is not part of the github release files? If it is, that would make it easier to write it for another version but for now I'll check the dependencies per version :)
Correct, the yml file is not part of the release as an individual file. To access the contents of it you would need to do this programmatically with a getResource("phg_environment.yml") command against the java class. If you think this would be useful, we could consider putting phg_environment.yml in the phg/resources/main folder with the application.conf file.
For me it is not needed since I know where to look, but should you decide to add a docker and/or singularity definition file to your repo it would definitely make it more future- and fool-proof I think.
Closing this issue as user has workaround in place.