This repo contains various analysis pipelines for the lab. Here are the basic rules:
- each folder includes pipelines for a particular analysis - data type combination
- pipelines are nextflow workflows
- each pipeline comes with a list of conda environment files that manage the required software
Pipelines will usually operate from a top level project directory structured in the following way:
project/
> [WORKFLOW].nf
> data
> raw
> step1
> step2
> output1.csv
> ...
> figures
> figure1.png
> ...
The initial raw data lives in data/raw
and all analysis artifacts should
be written into data/
as well. Figures go into figures/
.
The first step is to copy or symlink the pipeline files into the top project directory. After that you can set up a conda environment that includes all software for the pipeline.
conda env create -f conda.yml
Activate the environment (usualy named after the pipeline):
conda activate metagenomics
If present also install R dependencies
Rscript setup.R
You may also create a nextflow config either in the project
directory as nextflow.config
or in your user HOME as ~/.nextflow/config
. Here is an example for moneta
executor {
name = "local"
cpus = 32
memory = "350GB"
}
tower {
accessToken = "<TOKEN>"
enabled = false
}
The tower part is only necessary if you want to use Nextflow Tower to track your pipeline.
After setup you can run the pipeline with
nextflow run [WORKFLOW].nf -resume
By default this will use all available CPUs and RAM unless specified otherwise in a personal netxflow config.