merenlab/anvio

[FEATURE REQUEST] Simplify the anvi'o conda installation command

Opened this issue · 4 comments

The need

Using different means of installing packages for anvi'o might raise the chance of conflicts in the future.

The solution

I would like to start a discussion whether we can simplify the anvi'o installation command by completely removing pip install. I don't know if there was a need to separate the installation of the packages e.g. limited availability but with today's conda forge library, a lot of packages are available through conda install -c conda-forge. Below I attach a one line conda installation command containing all the packages in requirements.txt and the related terminal output.

Happy to hear some feedback from the community :)

conda create -n anvio-dev -y -c conda-forge -c bioconda python=3.10 \
        sqlite prodigal idba mcl muscle=3.8.1551 famsa hmmer diamond \
        blast megahit spades bowtie2 bwa graphviz "samtools>=1.9" \
        trimal iqtree trnascan-se fasttree vmatch r-base r-tidyverse \
        r-optparse r-stringi r-magrittr bioconductor-qvalue meme ghostscript \
        nodejs fastani "numpy<=1.24" scipy bottle pysam ete3 scikit-learn==1.2.2 \
        django requests mistune six matplotlib==3.5.1 statsmodels colored illumina-utils \
        tabulate rich-argparse numba paste pyani psutil pandas==1.4.4 snakemake \
        multiprocess plotext networkx==3.1 pulp==2.7.0 biopython reportlab pymupdf \
Channels:
 - conda-forge
 - bioconda
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/ahenoch/miniconda3/envs/anvio-dev

  added / updated specs:
    - bioconductor-qvalue
    - biopython
    - blast
    - bottle
    - bowtie2
    - bwa
    - colored
    - diamond
    - django
    - ete3
    - famsa
    - fastani
    - fasttree
    - ghostscript
    - graphviz
    - hmmer
    - idba
    - illumina-utils
    - ipympl
    - iqtree
    - jupyter-resource-usage
    - jupyterlab
    - jupyterlab-git
    - jupyterlab-lsp
    - matplotlib==3.5.1
    - mcl
    - megahit
    - meme
    - mistune
    - multiprocess
    - muscle=3.8.1551
    - networkx==3.1
    - nodejs
    - numba
    - numpy[version='<=1.24']
    - pandas==1.4.4
    - paste
    - plotext
    - prodigal
    - psutil
    - pulp==2.7.0
    - pyani
    - pymupdf
    - pysam
    - python=3.10
    - r-base
    - r-magrittr
    - r-optparse
    - r-stringi
    - r-tidyverse
    - reportlab
    - requests
    - rich-argparse
    - samtools[version='>=1.9']
    - scikit-learn==1.2.2
    - scipy
    - six
    - snakemake
    - spades
    - sqlite
    - statsmodels
    - tabulate
    - trimal
    - trnascan-se
    - vmatch


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    bleach-6.2.0               |     pyhd8ed1ab_0         130 KB  conda-forge
    boto3-1.35.54              |     pyhd8ed1ab_0          81 KB  conda-forge
    botocore-1.35.54           |pyge310_1234567_0         6.9 MB  conda-forge
    fontconfig-2.15.0          |       h7e30c49_1         259 KB  conda-forge
    google-api-core-2.22.0     |     pyhd8ed1ab_0          87 KB  conda-forge
    google-api-python-client-2.151.0|     pyhff2d567_0         5.8 MB  conda-forge
    grpcio-1.67.1              |  py310h1a6248f_0         831 KB  conda-forge
    jupyterlab-4.3.0           |     pyhd8ed1ab_0         7.0 MB  conda-forge
    jupyterlab-git-0.50.2      |     pyhd8ed1ab_0         881 KB  conda-forge
    levenshtein-0.26.1         |  py310hf71b8c6_0         131 KB  conda-forge
    libgrpc-1.67.1             |       hc2c308b_0         7.0 MB  conda-forge
    lxml-5.3.0                 |  py310h6ee67d5_2         1.3 MB  conda-forge
    mpg123-1.32.9              |       hc50e24c_0         480 KB  conda-forge
    oauth2client-4.1.3         |     pyhd8ed1ab_1          71 KB  conda-forge
    perl-app-cpanminus-1.7048  | pl5321hd8ed1ab_0         225 KB  conda-forge
    perl-compress-raw-bzip2-2.201| pl5321hbf60520_0          54 KB  conda-forge
    perl-compress-raw-zlib-2.202| pl5321hadc24fc_0          78 KB  conda-forge
    perl-list-moreutils-xs-0.430| pl5321h031d066_3          52 KB  bioconda
    perl-scalar-list-utils-1.63| pl5321hb9d3cd8_1          49 KB  conda-forge
    prettytable-3.12.0         |     pyhd8ed1ab_0          32 KB  conda-forge
    python-levenshtein-0.26.1  |     pyhff2d567_0          15 KB  conda-forge
    r-fs-1.6.5                 |    r44h93ab643_0         498 KB  conda-forge
    r-ps-1.8.1                 |    r44h2b5f3a1_0         386 KB  conda-forge
    r-tinytex-0.54             |    r44hc72bb7e_0         149 KB  conda-forge
    r-withr-3.0.2              |    r44hc72bb7e_0         229 KB  conda-forge
    rich-13.9.4                |     pyhd8ed1ab_0         181 KB  conda-forge
    rich-argparse-1.6.0        |     pyhd8ed1ab_0          22 KB  conda-forge
    rpds-py-0.20.1             |  py310h505e2c1_0         326 KB  conda-forge
    sqlite-3.47.0              |       h9eae976_1         863 KB  conda-forge
    tqdm-4.66.6                |     pyhd8ed1ab_0          87 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        34.1 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
  _r-mutex           conda-forge/noarch::_r-mutex-1.0.1-anacondar_1 
  aioeasywebdav      conda-forge/noarch::aioeasywebdav-2.4.0-pyha770c72_0 
  ...
  zstd               conda-forge/linux-64::zstd-1.5.6-ha6fb4c9_0 

Downloading and Extracting Packages:
                                                                                                                      
Preparing transaction: done                                                                                           
Verifying transaction: done                                                                                           
Executing transaction: \ vmatch selection functions are installed in /home/ahenoch/miniconda3/envs/anvio-dev/lib.     
You can use them without specifying their full path.                                                                  
Symbol map files are installed in /home/ahenoch/miniconda3/envs/anvio-dev/share/vmatch-2.3.0-5/TRANS/.                
Activation and deactivation scripts will set MKVTREESMAPDIR accordingly                                               
so you can use symbol map files without specifying their full path.                                                   
Those scripts are in /home/ahenoch/miniconda3/envs/anvio-dev/etc/conda/activate.d/vmatch-2.3.0-5.sh and /home/ahenoch/miniconda3/envs/anvio-dev/etc/conda/deactivate.d/vmatch-2.3.0-5.sh respectively.                                      
                                                                                                         
done                                                                                                                  
#                                                                                                                     
# To activate this environment, use                                                                                   
#                                                                                                                     
#     $ conda activate anvio-dev                                                                                      
#                                                                                                                     
# To deactivate an active environment, use                                                                            
#                                                                                                                     
#     $ conda deactivate 

Beneficiaries

With anvi'o growing every day thinking about means to prevent future package conflicts might be useful to everyone :)

Hey @ahenoch ,

I don't see any reason to not do it!
But I have another suggestion about installation :)

if we create a file called environment.yml under the Anvio directory and update it like this;

name: anvio-dev
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - bioconductor-qvalue
  - biopython
  - blast
  - bottle
  - bowtie2
  - bwa
  - colored
  - diamond
  - django
  - ete3
  - famsa
  - fastani
  - fasttree
  - ghostscript
  - graphviz
  - hmmer
  - idba
  - illumina-utils
  - iqtree
  - matplotlib==3.5.1
  - mcl
  - megahit
  - meme
  - mistune
  - multiprocess
  - muscle=3.8.1551
  - networkx==3.1
  - nodejs
  - numba
  - numpy[version='<=1.24']
  - pandas==1.4.4
  - paste
  - plotext
  - prodigal
  - psutil
  - pulp==2.7.0
  - pyani
  - pymupdf
  - pysam
  - python=3.10
  - r-base
  - r-magrittr
  - r-optparse
  - r-stringi
  - r-tidyverse
  - reportlab
  - requests
  - rich-argparse
  - samtools[version='>=1.9']
  - scikit-learn==1.2.2
  - scipy
  - six
  - snakemake
  - spades
  - sqlite
  - statsmodels
  - tabulate
  - trimal
  - trnascan-se
  - vmatch

It would be enough to run a simple code to download all packages,

conda env create -f environment.yml

We could even create separate environment files for Anvio-version and Anvio-dev and manage them separately.
Which can be very nice :)

PS: If you going to run that, don't forget to change name: anvio-dev to something else.

Is it possible to not have the name of the env in the yaml file? So that the user can choose the name.
Also, sometimes we want to install anvi'o using conda's path instead of name system. Like conda create -p /path/to/my/env/ and we need to keep this possible (installation on server, etc).

Yes we can do that @FlorianTrigodet we can remove name: anvio-dev and just run like

conda env create -n anvio-whatever -f environment.yml

By the way, we already have environment.yml file under anvio/.conda directory. We use that for creating test env on Github.

@metehaansever the environment .yml we have is incomplete, it is only functional by combining it again with the pip command. I would suggest we create a second full yml file containing all the packages the both of us mentioned above and test it with that.

channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - bioconductor-qvalue
  - biopython
  - blast
  - bottle
  - bowtie2
  - bwa
  - colored
  - diamond
  - django
  - ete3
  - famsa
  - fastani
  - fasttree
  - ghostscript
  - graphviz
  - hmmer
  - idba
  - illumina-utils
  - iqtree
  - matplotlib==3.5.1
  - mcl
  - megahit
  - meme
  - mistune
  - multiprocess
  - muscle=3.8.1551
  - networkx==3.1
  - nodejs
  - numba
  - numpy[version='<=1.24']
  - pandas==1.4.4
  - paste
  - plotext
  - prodigal
  - psutil
  - pulp==2.7.0
  - pyani
  - pymupdf
  - pysam
  - python=3.10
  - r-base
  - r-magrittr
  - r-optparse
  - r-stringi
  - r-tidyverse
  - reportlab
  - requests
  - rich-argparse
  - samtools[version='>=1.9']
  - scikit-learn==1.2.2
  - scipy
  - six
  - snakemake
  - spades
  - sqlite
  - statsmodels
  - tabulate
  - trimal
  - trnascan-se
  - vmatch
conda env create -n anvio-dev -f environment.yml -y -p /path/to/environment