/snappy-pipeline

SNAPPY Nucleic Acid Processing in Python

Primary LanguagePythonMIT LicenseMIT

CI Coverage Status Documentation Status

SNAPPY - SNAPPY Nucleic Acid Processing Pipeline

Installation

Installation should be complete in 10 to 15 minutes.

In a nutshell:

# Download & preparation
git clone git@github.com:bihealth/snappy-pipeline.git
cd snappy-pipeline

# If you want to select a given branch, uncomment the following:
# git checkout <branch_name>

# WARNING- make sure that you are in your conda base environment

# Create conda environment "snappy_env" with the minimal requirements:
mamba env create --file environment.yml
conda activate snappy_env

# Add testing & development requirements:
pip install -r requirements/test.txt
pip install -r requirements/dev.txt

# Optionally add "pytest-pdb" missing from anaconda
pip install pytest-pdb

# Install snappy in snappy_env environment
pip install -e .

Note: To create the environment under another name, replace the commands for the environment creation & activation of the correct environment by:

mamba env create --file environment.yml --name <other_environment_name>
conda activate <other_environment_name>

See user installation if you just want to use the pipeline.

See [developer installation)[docs/installation.rst) for getting started with working on the pipeline code and also building the documentation.

Using GATK3

Some wrappers rely on GATK 3. GATK v3 is not free software and cannot be redistributed. Earlier, we had an internal CUBI conda server but this limits use of the wrapper for the general public. Now, the using pipeline steps must be activated as follows.

If you are a member of CUBI, you can use the central GATK download. Alternatively, you can download the tarball from the Broad archive.

$ ls -lh /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2
-rw-rw---- 1 holtgrem_c hpc-ag-cubi 14M Dec 19  2019 /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2

First, go to the pipeline directory where you want to run:

$ cd variant_calling

Explicitely create any missing conda environment

$ snappy-snake --conda-create-envs-only
[...]
12-27 17:18 snakemake.logging WARNING  Downloading and installing remote packages.
[...]

Find out which conda environments use GATK v3

$ grep 'gatk.*3' .snakemake/conda/*.yaml
.snakemake/conda/d76b719b718c942f8e49e55059e956a6.yaml:  - gatk =3

Activate each conda environment and register

$ for yaml in $(grep -l 'gatk.*3' .snakemake/conda/*.yaml); do
        environ=${yaml%.yaml};
        conda activate $environ
        gatk3-register /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2
        conda deactivate
    done
Moving GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2 to /home/holtgrem_c/miniconda3/envs/gatk3/opt/gatk-3.8

You are now ready to run GATK v3 from this environment.

Development Notes

Here, you can find the required layout for post-PR commit messages: