Some simple Nextflow workflows that illustrate common patterns. These are intended to be used as templates for more useful workflows.
Nextflow provides a language and tools to construct pipelines, composed of arbitrary data and commands, and run them in various environments. See https://nextflow.io for more information and installation instructions.
One of the major advantages of Nextflow (and other, similar systems) is that a workflow can generally run in a variety of environments, such as AWS or a cluster, with very little modification.
If you run into a bug or missing feature, you might try the "edge" release, which is updated more frequently than the default "stable" release. To update, run the following:
export NXF_EDGE=1
nextflow self-update
Each workflow consists of several common files. These are described below.
workflow.nf - this is where the workflow is defined. It is written in a domain specific language (DSL) used by Nextflow to define how data flow from operation to operation.
nextflow.config - this file defines configuration options for different
environments (such as local or AWS). Each set of options is called a profile,
and this can be set from the command line using the -profile
option when
running the workflow. The name of this file causes it to be picked up by
Nextflow automatically, to use a different name, specify the file with the
-config
option on the command line.
params.yaml - contains workflow parameters, such as where to find the
necessary input files. Parameters can also be set on the command line by using a
double dash along with the parameter name (--foo
, for example). Specify which
params file to use for a given run with the -params-file
option.
Run a workflow using the nextflow run
command:
nextflow run workflow.nf -params-file params.yaml -profile local
It is possible to run a workflow in AWS using the Batch service. Creating the necessary resources within your AWS account to allow this requires a few steps that are described elsewhere. However, once this is done, there are just two steps required to run any of the workflows in this repo in AWS:
- Configure AWS credentials using environment variables
- Pass the
-profile aws_batch
flag tonextflow run
The following environment variables are required:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
The last can usually be set to us-east-1
. The others can be found in your AWS
user settings.
char_count/ - count characters in files and sum the counts. This is a good place to start since the basic patterns on display here are quite common.