Conda solver slowdown FAQ and recommendations
bgruening opened this issue · 19 comments
Hi all,
this issue is intended keep the community up-to-date about the recent state of the conda solver, how you can improve things, and what we are working on to make it better.
What is the problem?
Conda currently uses an SAT (boolean satisfiability) solver to figure out the correct, and hopefully working, set of packages required to construct a functional environment. This means downloading the package index, cutting down the search-space, iterating the graph, inspecting the pinnings and so on.
Conda/Bioconda is special in that we have 1000s of Python and R packages. Recently, we’ve begun adding entire Bioconductor releases, with thousands of packages. Conda supports mixed environments, like Python+R+Perl, and does not remove old packages from the index. On the one hand, this enables reproducibility in the future (Need an old version of an R package or deepTools? No problem.), on the other hand it results in an incredibly large search space for the dependency solver to traverse. So in contrast to other package managers, Conda is constantly growing and we are currently not cutting out dead wood.
So we do face a special situation in Conda. Please take this into account when considering Conda’s performance. Yes, Conda is slow and will probably never be as fast as other package managers because Conda is vastly larger and supports scientific use-cases that others do not support.
However, we are aware of this and multiple people are working on it. See our tips below.
How to improve solver performance
Conda is especially slow if R is involved. This has historical reasons, as most of the packages are in all 3 supported channels (anaconda, conda-forge, bioconda). This was our fault. However, things should improve dramatically if you install the latest version available, e.g. bioconductor-deseq2=1.22.1. We’ve learned from past issues and now pin to one particular R version. However, old packages are still around for the sake of reproducibility.
Use pins, install packages with versions. Even conda create -n foo python=3 deeptools
will help. You will magically solve all your R envs by simply adding r-base=3.5.1
to your package install list.
Recommendations
A few recommendations, especially for environments with R inside:
- use conda >=4.6.x
- For bioconda packages, use the recommended channel order
- try the new experimental
pycryptosat
* solver (https://www.anaconda.com/conda-4-6-release …)
conda install pycryptosat
conda config --set sat_solver pycryptosat
- use
--strict-channel-priority
conda config --set channel_priority strict
- Do not use
conda install
useconda create
- Use environment.yaml files where ever you can. These include exact package versions, removing much of the solver’s workload and drastically speeding things up.
*
Different people from the community are trying to improve the solver or using different strategies to improve the situation. This is, and probably always be, a work in progress. Conda will grow and Anaconda and the community will improve things as we go.
cutting down the search space
Please have a look at https://github.com/regro/conda-metachannel. Conda Metachannels are work in progress but will allow users to specify the portion of the graph they care about upfront. It is very rare that users will actually need ALL of the packages in bioconda/conda-forge. Think about it like a constrained channel, only a specific set of your packages appear in this special channel. All others are not available, so you can not recreate a 3 years old environment with this channel. However, if you have this use case you can just switch back to the normal channels.
Maybe we should have this at some point for our community. The idea could be, having all recent (~2 years) packages in this space but all others still available to reproduce old envs. Start a discussion!
Bioconda is prepared
Very early on we recognised the special challenges that Conda is trying to face and we are prepared for the special use-case of long-term reproducibility - BioContainers. The containers are frozen sets of conda environments. A BioContainer is created for every Bioconda package, but you can also create your own. https://usegalaxy.eu is maintaining 1034 environments currently using BioContainers and it works well in that demanding environment.
Read more about this in our manuscript.
I recommend BioContainers for static/reproducible environments. For flexible environments we could use a metachannel in the future if we want to maintain this.
That said, I use conda on a daily basis and with the above recommendation I do not need a metachannel, as the normal conda solver is fast enough for me. However, I believe the conda community is prepared for the future.
Feedback
We would like to get feedback, benchmarks and examples do help us. What does slow mean? Considering what Conda is doing for you behind the scenes, is 30s or a minute really slow? Please provide numbers and the exact installation command.
Last but not least I would like to thank the conda-forge team, Anaconda and the
@bioconda/core team that are constantly working on all the packages and trying to keep things fast and reliable even with 100k packages.
Thanks for this. I'm working on a similar post for Anaconda. I'll link to this and also copy some of the suggestions here.
Björn and all;
This is incredible, thanks for the summary of the issues and current solutions. I'd like to explore the metachannel approach for bcbio. We're rearranging the current install and preparing to move to py3k as the default so this makes some of the updates especially slow. conda 4.6 and pycryptosat didn't make a big difference in these cases (see discussion here: bcbio/bcbio-nextgen#2676).
Is this something we want to host/do in a standard way? It might be useful for other projects as well and also would be nice to have a standard set of URLs for these channels. I'd be interested to hear if anyone else has explored this yet.
We were discussing setting up a meta channel where only the last 1-2 years of packages would be included. In theory that could be setup for things like bcbio as well, though I expect that'd be done on the bcbio side (presumably after a convenient "step by step guide" was put together).
Are you using environment yaml files in bcbio to install things? I've been using them in snakePipes and it's generally worked pretty well for our rather complicated environments.
@chapmanb please let us know which exact command is slow and what means slow :)
I would like to try this, especially since 4.6.x was faster in all my tests.
Btw. are you not using environment.yaml files in bcbio? This should be super fast as the solver will not be stressed much.
Hello @bioconda/core,
Is there any statistics about the popularity and usage level of the R packages, e.g. the ratio of bioconda recipes that depend on the R packages or number of downloads?
Usually simple solutions win, like the Conda itself! Wouldn't it work to simply(?) split and clone the latest version of the recipes into 2-3 new spawned channels like bioconda-R
, bioconda-core
, ... and halt/leave the bioconda
channel for reproduciblity?
From my solo experience, bioconda has been much more about the python scripts&libs, pre-compiled C++ tools and also some perl dependencies. Also I guess hardcore R devs typically prefer sticking to R's specific packaging solutions.
After no resolution in an overnight conda solving fiasco, I simply removed conda-forge channel from my conda config and solved in under three minutes, creating a new environment with a package from bioconda.
conda create -n cnvkit -c bioconda cnvkit
is what I was executing. As it turns out, it doesn't install the most recent version this way, so conda-forge is probably necessary to solve for the dependencies of the most recent version.
cnvkit 0.9.6a0 py36_2
---------------------
file name : cnvkit-0.9.6a0-py36_2.tar.bz2
name : cnvkit
version : 0.9.6a0
build : py36_2
build number: 2
size : 232 KB
license : Apache License 2.0
subdir : linux-64
url : https://conda.anaconda.org/bioconda/linux-64/cnvkit-0.9.6a0-py36_2.tar.bz2
md5 : 5b1130b128e862a68c16b5996999a830
timestamp : 2019-01-30 01:34:08 UTC
dependencies:
- bioconductor-dnacopy
- biopython >=1.62
- future >=0.15.2
- matplotlib >=1.3.1
- numpy >=1.9
- pandas >=0.18.1
- pyfaidx >=0.4.7
- pysam >=0.10.0
- python >=3.6,<3.7.0a0
- python-dateutil >=2.5.0
- r-base >=3.4.1
- r-cghflasso
- reportlab >=3.0
- scipy >=0.15.0
With conda-forge in my channels and the following config, the solver was still running overnight:
$conda config --show
add_anaconda_token: True
add_pip_as_python_dependency: True
aggressive_update_packages:
- ca-certificates
- certifi
- openssl
allow_conda_downgrades: False
allow_cycles: True
allow_non_channel_urls: False
allow_softlinks: False
always_copy: False
always_softlink: False
always_yes: None
anaconda_upload: None
auto_activate_base: True
auto_update_conda: True
bld_path:
changeps1: True
channel_alias: https://conda.anaconda.org
channel_priority: strict
channels:
- bioconda
- conda-forge
- defaults
client_ssl_cert: None
client_ssl_cert_key: None
clobber: False
conda_build: {}
create_default_packages: []
croot: /home/groups/hoolock/u1/jvc/miniconda3/conda-bld
custom_channels:
pkgs/main: https://repo.anaconda.com
pkgs/free: https://repo.anaconda.com
pkgs/r: https://repo.anaconda.com
pkgs/pro: https://repo.anaconda.com
custom_multichannels:
defaults:
- https://repo.anaconda.com/pkgs/main
- https://repo.anaconda.com/pkgs/free
- https://repo.anaconda.com/pkgs/r
local:
debug: False
default_channels:
- https://repo.anaconda.com/pkgs/main
- https://repo.anaconda.com/pkgs/free
- https://repo.anaconda.com/pkgs/r
default_python: 3.6
deps_modifier: not_set
disallowed_packages: []
download_only: False
dry_run: False
enable_private_envs: False
env_prompt: ({default_env})
envs_dirs:
- /home/groups/hoolock/u1/jvc/miniconda3/envs
- /home/users/vancampe/.conda/envs
error_upload_url: https://conda.io/conda-post/unexpected-error
extra_safety_checks: False
force: False
force_32bit: False
force_reinstall: False
force_remove: False
ignore_pinned: False
json: False
local_repodata_ttl: 1
migrated_channel_aliases: []
migrated_custom_channels: {}
non_admin_enabled: True
notify_outdated_conda: True
offline: False
override_channels_enabled: True
path_conflict: clobber
pinned_packages: []
pip_interop_enabled: False
pkgs_dirs:
- /home/groups/hoolock/u1/jvc/miniconda3/pkgs
- /home/users/vancampe/.conda/pkgs
proxy_servers: {}
prune: False
quiet: False
remote_connect_timeout_secs: 9.15
remote_max_retries: 3
remote_read_timeout_secs: 60.0
report_errors: None
rollback_enabled: True
root_prefix: /home/groups/hoolock/u1/jvc/miniconda3
safety_checks: warn
sat_solver: pycryptosat
shortcuts: True
show_channel_urls: None
solver_ignore_timestamps: False
ssl_verify: True
subdir: linux-64
subdirs:
- linux-64
- noarch
target_prefix_override:
track_features: []
update_modifier: update_specs
use_index_cache: False
use_local: False
verbosity: 0
whitelist_channels: []
@jakevc following the suggestions from above the following works on seconds for me: conda create -n cnvkit-r cnvkit r-base=3.5.1
@jakevc the recommended channels order changed a few months back when using bioconda and conda-forge, could you reorder your channels as below?
channels:
- conda-forge
- bioconda
- defaults
The recommended channels order did make everything more responsive:
$time conda create -n cnvkit cnvkit
Collecting package metadata: done
Solving environment: done
## Package Plan ##
environment location: /home/groups/hoolock/u1/jvc/miniconda3/envs/cnvkit
added / updated specs:
- cnvkit
The following packages will be downloaded:
package | build
---------------------------|-----------------
atk-2.25.90 | hf2eb9ee_1001 430 KB conda-forge
bioconductor-dnacopy-1.56.0| r351h9ac9557_0 435 KB bioconda
biopython-1.73 | py36h14c3975_0 2.5 MB conda-forge
bwidget-1.9.11 | 1 113 KB
cairo-1.14.12 | h80bd089_1005 1.4 MB conda-forge
cnvkit-0.9.6a0 | py36_2 232 KB bioconda
curl-7.64.0 | h646f8bb_0 143 KB conda-forge
gdk-pixbuf-2.36.12 | h4f1c04b_1001 598 KB conda-forge
glib-2.56.2 | had28632_1001 4.7 MB conda-forge
gobject-introspection-1.56.1|py36h9e29830_1001 1.3 MB conda-forge
gsl-2.2.1 |blas_openblashddceaf2_6 2.1 MB conda-forge
gstreamer-1.12.5 | h0cc0488_1000 3.7 MB conda-forge
gtk2-2.24.31 | h5baeb44_1000 7.3 MB conda-forge
harfbuzz-1.9.0 | he243708_1001 957 KB conda-forge
krb5-1.16.3 | hc83ff2d_1000 1.4 MB conda-forge
libcurl-7.64.0 | h01ee5af_0 586 KB conda-forge
libedit-3.1.20170329 | hf8c457e_1001 172 KB conda-forge
libgfortran-3.0.0 | 1 281 KB conda-forge
libssh2-1.8.0 | h1ad7b7a_1003 246 KB conda-forge
libtiff-4.0.10 | h9022e91_1002 555 KB conda-forge
make-4.2.1 | h14c3975_2004 458 KB conda-forge
matplotlib-3.0.3 | py36_0 6 KB conda-forge
matplotlib-base-3.0.3 | py36h167e16e_0 6.7 MB conda-forge
numpy-1.16.2 |py36_blas_openblash1522bff_0 4.3 MB conda-forge
olefile-0.46 | py_0 31 KB conda-forge
openblas-0.3.3 | h9ac9557_1001 15.8 MB conda-forge
pandas-0.24.1 | py36hf484d3e_0 11.1 MB conda-forge
pango-1.40.14 | hf0c64fd_1003 532 KB conda-forge
pillow-5.4.1 |py36h00a061d_1000 614 KB conda-forge
pyfaidx-0.5.5.2 | py_0 25 KB bioconda
pysam-0.15.2 | py36h1671916_1 2.2 MB bioconda
qt-5.6.2 | hbe13537_1012 44.5 MB conda-forge
r-base-3.5.1 | h391c2eb_5 37.6 MB conda-forge
r-cghflasso-0.2_1 |r351h9ac9557_1001 199 KB conda-forge
reportlab-3.5.13 |py36hbd3ef63_1000 2.4 MB conda-forge
samtools-1.9 | h57cc563_7 636 KB bioconda
scipy-1.2.1 |py36_blas_openblash1522bff_0 18.1 MB conda-forge
tktable-2.10 | h14c3975_0 88 KB
tornado-6.0.1 | py36h14c3975_0 635 KB conda-forge
zstd-1.3.3 | 1 1023 KB conda-forge
------------------------------------------------------------
Total: 175.8 MB
The following NEW packages will be INSTALLED:
_r-mutex pkgs/r/linux-64::_r-mutex-1.0.0-anacondar_1
atk conda-forge/linux-64::atk-2.25.90-hf2eb9ee_1001
bcftools bioconda/linux-64::bcftools-1.9-h47928c2_2
bioconductor-dnac~ bioconda/linux-64::bioconductor-dnacopy-1.56.0-r351h9ac9557_0
biopython conda-forge/linux-64::biopython-1.73-py36h14c3975_0
blas conda-forge/linux-64::blas-1.1-openblas
bwidget pkgs/main/linux-64::bwidget-1.9.11-1
bzip2 conda-forge/linux-64::bzip2-1.0.6-h14c3975_1002
ca-certificates conda-forge/linux-64::ca-certificates-2018.11.29-ha4d7672_0
cairo conda-forge/linux-64::cairo-1.14.12-h80bd089_1005
certifi conda-forge/linux-64::certifi-2018.11.29-py36_1000
cnvkit bioconda/linux-64::cnvkit-0.9.6a0-py36_2
curl conda-forge/linux-64::curl-7.64.0-h646f8bb_0
cycler conda-forge/noarch::cycler-0.10.0-py_1
dbus conda-forge/linux-64::dbus-1.13.0-h4e0c4b3_1000
expat conda-forge/linux-64::expat-2.2.5-hf484d3e_1002
fontconfig conda-forge/linux-64::fontconfig-2.13.1-h2176d3f_1000
freetype conda-forge/linux-64::freetype-2.9.1-h94bbf69_1005
future conda-forge/linux-64::future-0.17.1-py36_1000
gdk-pixbuf conda-forge/linux-64::gdk-pixbuf-2.36.12-h4f1c04b_1001
gettext conda-forge/linux-64::gettext-0.19.8.1-h9745a5d_1001
glib conda-forge/linux-64::glib-2.56.2-had28632_1001
gobject-introspec~ conda-forge/linux-64::gobject-introspection-1.56.1-py36h9e29830_1001
graphite2 conda-forge/linux-64::graphite2-1.3.13-hf484d3e_1000
gsl conda-forge/linux-64::gsl-2.2.1-blas_openblashddceaf2_6
gstreamer conda-forge/linux-64::gstreamer-1.12.5-h0cc0488_1000
gtk2 conda-forge/linux-64::gtk2-2.24.31-h5baeb44_1000
harfbuzz conda-forge/linux-64::harfbuzz-1.9.0-he243708_1001
htslib bioconda/linux-64::htslib-1.9-h47928c2_5
icu conda-forge/linux-64::icu-58.2-hf484d3e_1000
jpeg conda-forge/linux-64::jpeg-9c-h14c3975_1001
kiwisolver conda-forge/linux-64::kiwisolver-1.0.1-py36h6bb024c_1002
krb5 conda-forge/linux-64::krb5-1.16.3-hc83ff2d_1000
libcurl conda-forge/linux-64::libcurl-7.64.0-h01ee5af_0
libdeflate bioconda/linux-64::libdeflate-1.0-h14c3975_1
libedit conda-forge/linux-64::libedit-3.1.20170329-hf8c457e_1001
libffi conda-forge/linux-64::libffi-3.2.1-hf484d3e_1005
libgcc-ng conda-forge/linux-64::libgcc-ng-7.3.0-hdf63c60_0
libgfortran conda-forge/linux-64::libgfortran-3.0.0-1
libgfortran-ng conda-forge/linux-64::libgfortran-ng-7.2.0-hdf63c60_3
libiconv conda-forge/linux-64::libiconv-1.15-h14c3975_1004
libpng conda-forge/linux-64::libpng-1.6.36-h84994c4_1000
libssh2 conda-forge/linux-64::libssh2-1.8.0-h1ad7b7a_1003
libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-7.3.0-hdf63c60_0
libtiff conda-forge/linux-64::libtiff-4.0.10-h9022e91_1002
libuuid conda-forge/linux-64::libuuid-2.32.1-h14c3975_1000
libxcb conda-forge/linux-64::libxcb-1.13-h14c3975_1002
libxml2 conda-forge/linux-64::libxml2-2.9.8-h143f9aa_1005
make conda-forge/linux-64::make-4.2.1-h14c3975_2004
matplotlib conda-forge/linux-64::matplotlib-3.0.3-py36_0
matplotlib-base conda-forge/linux-64::matplotlib-base-3.0.3-py36h167e16e_0
ncurses conda-forge/linux-64::ncurses-6.1-hf484d3e_1002
numpy conda-forge/linux-64::numpy-1.16.2-py36_blas_openblash1522bff_0
olefile conda-forge/noarch::olefile-0.46-py_0
openblas conda-forge/linux-64::openblas-0.3.3-h9ac9557_1001
openssl conda-forge/linux-64::openssl-1.0.2r-h14c3975_0
pandas conda-forge/linux-64::pandas-0.24.1-py36hf484d3e_0
pango conda-forge/linux-64::pango-1.40.14-hf0c64fd_1003
pcre conda-forge/linux-64::pcre-8.41-hf484d3e_1003
pillow conda-forge/linux-64::pillow-5.4.1-py36h00a061d_1000
pip conda-forge/linux-64::pip-19.0.3-py36_0
pixman conda-forge/linux-64::pixman-0.34.0-h14c3975_1003
pthread-stubs conda-forge/linux-64::pthread-stubs-0.4-h14c3975_1001
pyfaidx bioconda/noarch::pyfaidx-0.5.5.2-py_0
pyparsing conda-forge/noarch::pyparsing-2.3.1-py_0
pyqt conda-forge/linux-64::pyqt-5.6.0-py36h13b7fb3_1008
pysam bioconda/linux-64::pysam-0.15.2-py36h1671916_1
python conda-forge/linux-64::python-3.6.7-hd21baee_1002
python-dateutil conda-forge/noarch::python-dateutil-2.8.0-py_0
pytz conda-forge/noarch::pytz-2018.9-py_0
qt conda-forge/linux-64::qt-5.6.2-hbe13537_1012
r-base conda-forge/linux-64::r-base-3.5.1-h391c2eb_5
r-cghflasso conda-forge/linux-64::r-cghflasso-0.2_1-r351h9ac9557_1001
readline conda-forge/linux-64::readline-7.0-hf8c457e_1001
reportlab conda-forge/linux-64::reportlab-3.5.13-py36hbd3ef63_1000
samtools bioconda/linux-64::samtools-1.9-h57cc563_7
scipy conda-forge/linux-64::scipy-1.2.1-py36_blas_openblash1522bff_0
setuptools conda-forge/linux-64::setuptools-40.8.0-py36_0
sip conda-forge/linux-64::sip-4.18.1-py36hf484d3e_1000
six conda-forge/linux-64::six-1.12.0-py36_1000
sqlite conda-forge/linux-64::sqlite-3.26.0-h67949de_1000
tk conda-forge/linux-64::tk-8.6.9-h84994c4_1000
tktable pkgs/main/linux-64::tktable-2.10-h14c3975_0
tornado conda-forge/linux-64::tornado-6.0.1-py36h14c3975_0
wheel conda-forge/linux-64::wheel-0.33.1-py36_0
xorg-kbproto conda-forge/linux-64::xorg-kbproto-1.0.7-h14c3975_1002
xorg-libice conda-forge/linux-64::xorg-libice-1.0.9-h14c3975_1004
xorg-libsm conda-forge/linux-64::xorg-libsm-1.2.3-h4937e3b_1000
xorg-libx11 conda-forge/linux-64::xorg-libx11-1.6.7-h14c3975_1000
xorg-libxau conda-forge/linux-64::xorg-libxau-1.0.9-h14c3975_0
xorg-libxdmcp conda-forge/linux-64::xorg-libxdmcp-1.1.2-h14c3975_1007
xorg-libxext conda-forge/linux-64::xorg-libxext-1.3.3-h14c3975_1004
xorg-libxrender conda-forge/linux-64::xorg-libxrender-0.9.10-h14c3975_1002
xorg-libxt conda-forge/linux-64::xorg-libxt-1.1.5-h14c3975_1002
xorg-renderproto conda-forge/linux-64::xorg-renderproto-0.11.1-h14c3975_1002
xorg-xextproto conda-forge/linux-64::xorg-xextproto-7.3.0-h14c3975_1002
xorg-xproto conda-forge/linux-64::xorg-xproto-7.0.31-h14c3975_1007
xz conda-forge/linux-64::xz-5.2.4-h14c3975_1001
zlib conda-forge/linux-64::zlib-1.2.11-h14c3975_1004
zstd conda-forge/linux-64::zstd-1.3.3-1
Proceed ([y]/n)? y
.
.
.
done
#
# To activate this environment, use:
# > source activate cnvkit
#
# To deactivate an active environment, use:
# > source deactivate
#
real 3m4.988s
user 1m35.123s
sys 0m15.064s
Relevant: conda/conda#7239 and conda/conda#7700.
Is there some documentation other than the "Medium" post about how to use metachannel? Looking at the command lines in the post, it seems hard to use.
I'm suffering not only from lengthy install times, but the startup time on my WSL box is extremely long, like 30-50s.
@abalter Hello, I often do this.
function condain
conda install --override-channels -c https://metachannel.conda-forge.org/defaults,alienzj,pytorch,bioconda,conda-forge/$argv,--max-build-no "$argv"
end
define a condain function in your shell (like fish shell)
then do:
condain bwa
you will install bwa soon.
mamba also does a great job of speeding up the solving step.
For conda create -y --quiet --override-channels --channel iuc --channel conda-forge --channel bioconda --channel defaults --name __rpy2@2.9.4 rpy2=2.9.4
I just lost patience after some minutes and tried mambda create -y --quiet --override-channels --channel iuc --channel conda-forge --channel bioconda --channel defaults --name __rpy2@2.9.4 rpy2=2.9.4
and the solving step finished immediately.
@mvdbeek good to know, maybe a good lead. But maybe worth mentioning the Beta status.
Give conda 4.7 a try. https://www.anaconda.com/how-we-made-conda-faster-4-7/
are there any plans moving the bioconda build-system to mamba, like conda forge did?