Versions specified in conda env files and snakemake wrappers lead to conflicts/are not available
akriese opened this issue · 11 comments
Hi, yesterday I wanted to try out the workflow but struggled to get everything installed. Before posting the issue I wanted to make sure, that I haven't overlooked anything obvious, but that doesn't seem to be the case.
My grenepipe env has the following version numbers:
conda 4.14.0
python 3.7.10
snakemake 6.0.5
Grenepipe 0.10.0-18ff70c
I set it up with the grenepipe.yaml
file. I am running snakemake with the following command:
snakemake --cores all --directory example/ --use-conda --conda-frontend mamba
There seems to a problem across the local environment configurations which specify fixed version numbers of conda packages which partly have unavailable dependencies. Specifically, these are:
bcftools.yaml
: removing the bcftools version fixed this (or updating it to the current 1.15.1)bwa.yaml
:samtools ==1.12
requires some unavailablehtslib
version. Update tosamtools==1.15.1
gatk.yaml
: fixed by movingconda-forge
as first channelmultiqc.yaml
:multiqc==1.10.1
is missing a dependency (requests
), fix by addinganaconda
as first channelpicard.yaml
: fix by puttingconda-forge
in first channel
Furthermore, the used snakemake wrappers seem to be using outdated versions:
- samtools/stat: could be updated from 0.64.0 (samtools 1.10, missing htslib dependency) --> v1.12.0 (samtools 1.14)
- samtools/flagstat: could be updated from 0.64.0 (samtools 1.10, missing htslib dependency) to v1.12.2 (samtools 1.14)
- tabix (0.55.1) uses old htslib --> update to v1.12.2/bio/tabix/index (in 3 rule files)
With the aboce fixes, the envs can successfully be installed. But the pipeline breaks with errors, which I have to investigate. These are probably caused by the breaking changes between dependency versions.
Is grenepipe supposed to run out of the box as of today? If yes, could someone try out a fresh setup of grenepipe and see if it works on a different machine?
Looking forward to hearing from someone :)
Hi @akriese,
thanks for reporting the issue, and please excuse my late reply.
I have just run the grenepipe tests in a completely clean Ubuntu 22.04.1 LTS virtual box, using the exact same versions that you specified, and it worked perfectly fine. I've further tested on CentOS (slightly older conda, otherwise same). Which operating system are you on? That seems to be the most likely difference here.
The changes to the channels you suggest sound generally reasonable (in the sense that I think there is a chance that they don't break anything else). I would be interested in reproducing your errors before applying them though.
As for outdated versions, and the subsequent breaking of the pipeline: This is exactly the reason why the versions are specified so meticulously... otherwise, everything breaks due to weird incompatibilities between different tools. Took me ages to figure out a combination of versions that works - bioinformatics tools are a mess. If you really need to update them, I'm afraid you will have to experiment quite a bit.
Is grenepipe supposed to run out of the box as of today?
Of course it is :-) And I've 40ish tests in place that so far have have caught issues - but apparently not the ones you are running into. Hence my suspicion that this is due to a different operating system. Let me know which you are using, and we can see from there.
Cheers and so long
Lucas
Hi Lucas, thanks for the reply! I guess the problem on my system might be, that I've used conda for other projects on that machine before. So that might be the source of the version conflicts. What kind of information would help you reproduce this?
EDIT: The operating system in my institute is some custom linux distro called mariux64
Hm, under these circumstances, I'm not sure that I can reproduce this at all. I'm not sure what's going on there with conda, but I think it can indeed get cluttered, so maybe you can do a conda clean, and see if that helps?
Let's keep this issue open for now. I'll try your suggestions regarding package/channel order above, and see if that works, and might just go with them then. I want to do that after the next release though, so that there is a stable fallback point for other users - not sure when that will be, but I'll keep you posted here.
In the meantime, if you have access to some other systems, you could try them as well and see if you run into similar issues, and report back here? My goal is to have grenepipe run as smoothly as possible - happy to take all feedback!
Hi Lucas,
Thanks for developing this package; it will be very useful for our lab in the future.
I have been experiencing similar problems to @akriese. I am, however, attempting to locally install grenepipe on a macOS Darwin-22.1.0-x86_64 platform. As I experienced an 'almost' identical errors as above, I wanted to confirm with you is grenepipe intended to be run off only Linux distros, or have you used the pipeline locally on a MacOSX system as well?
Thanks!
Hi @roosheelpatel,
thanks for the report, and sorry that this is still happening. This whole setup with dozens of tools is a nightmare to maintain...
Generally, grenepipe is meant to work from small to large datasets, and is written with both Linux and MacOS in mind. That being said, it's most exhaustively tested on Linux. I do have some MacOS tests in place, and just the other day did a large scale test on MacOS as well, and it generally works. Admittedly, there are failed tests in there - many of them due to things like Conda having a hiccup, or the weather being too cold, or whatever mysterious other forces cause it to (non-repeatably) fail... most of them seem to be solved by just starting the test again. I have not figured out a good way to debug those issues, as they are non-deterministic. So, if anything fails, first test should be to just run it again...
However, as you experience conda environment issues that seem more reproducible, we might have a chance of fixing them. Are you using conda, or mamba? Could you maybe post the versions of these, or even better a full grenepipe log file here, and expand a bit on what exactly is going wrong in your particular case? That would greatly help me getting to the bottom of this!
So long, and thanks for your patience!
Lucas
By the way, @akriese, did you try again on your system, maybe with a cleaned conda setup? I still did not get to test your suggestions, but hopefully will get to do that in the next couple of weeks.
did you try again on your system, maybe with a cleaned conda setup?
Unfortunately, I do not work for my previous employer anymore, where I was trying out grenepipe
. Maybe, I'll get time to test it again at some point, but currently there is not a big intrinsic motivation to do so :(
EDIT: Also, I don't have access to that specific system anymore. So I can't really try it out in the same environment.
Hi @akriese, thanks for the update, and all the best!
Also, thank you again for your suggestions! I've just implemented (almost) all of them, in the hope that this fixes some long standing issues with conda, and also helps with #11. I'm currently running large scale tests on Ubuntu, CentOS, and MacOS, with conda, and with mamba (not all permutations, that would be too much, but the more important ones). Seems to be working well enough (with conda still being super slow... but at least it does not completely hang any more now)!
And @roosheelpatel, could you maybe try again using grenepipe at this commit? That should hopefully fix your issues. Please let me know if that works for you :-)
Okay, all environments (with the changes suggested by @akriese, thank you very much again!) now install fine on Ubuntu and on MacOS (with the exception of seqprep
, which is not available for MacOS), with both conda and mamba. Phew! Still, mamba is way faster (15min instead of 4h)...
I've just published grenepipe v0.12.0 that includes these changes. @roosheelpatel, that should hopefully solve your issues - please try with that version. I'm hence going to close this issue now, but feel free to re-open or open another one should the same problems still pop up!
Cheers and thanks for your patience!
Lucas
That sounds great! Out of curiosity: When I set up the env back then and changed the version numbers, the env could be set up, but I remember getting some errors during runtime (can't remember them tho). Does it work with this setup now (not just the env setup, but also the running)?