/my_first_pipeline

Primary LanguageNextflowGNU General Public License v3.0GPL-3.0

Viash project template

This repository is a template for setting up a new Viash project, and is part of the Quickstart tutorial to learn how to get started with this repository.

What is Viash?

Viash is your go-to script wrapper for building data pipelines from modular software components. All you need is your trusty script and a metadata file to embark on this journey.

Check out some of Viash’s key features:

  • Code in your favorite scripting language. Mix and match scripting between multiple components to suit your needs. Viash supports a wide range of languages, including Bash, Python, R, Scala, JS, and C#.

  • A custom Docker container is auto-generated based on the dependencies you’ve outlined in your metadata, meaning you don’t need to be a Docker expert.

  • Viash also generates a Nextflow module from your script, so no need to be a Nextflow guru either.

  • Effortlessly combine Nextflow modules to design and run scalable, reproducible data pipelines.

  • Test every component on your local workstation using the convenient built-in development kit.

graph LR
  subgraph component [Viash component]
    subgraph script [Script]
      rlang[R script]
      python[Python script]
      bash[Bash script]
      scriptetc[...]
    end

    config[Viash config]
  end

  viash_build[Viash build]

  docker_image[Docker image]
  executable[Executable]
  nextflow[Nextflow workflow]

  
  component --- viash_build --> executable & docker_image & nextflow
  docker_image -.-> executable & nextflow


  nextflow --dependency--> nextflow

  subgraph compute [Compute environment]
    direction LR
    local[Local execution]
    awsbatch[AWS Batch]
    googlebatch[Google Cloud Batch]
    hpc[HPC]
    infraetc[...]
  end

  nextflow --> compute
Loading

Requirements

This guide assumes you’ve already installed Viash, Docker. and Nextflow.

Structure of this template project

To get up and running fast, we provide a template project for you to use. It contains four components from the same package as well, which are combined into a Nextflow pipeline as follows:

graph TD
   input1(file1.tsv) --> B1[/remove_comments/] --> C1[/take_column/] --> Y
   input2(file2.tsv)--> B2[/remove_comments/] --> C2[/take_column/] --> Y
   Y[combine] --> D[/combine_columns/]
   D --> output(output.tsv)
Loading

This pipeline takes one or more TSV files as input and stores its output in an output folder.

Example usage

To run the pipeline, first create example input files.

Contents of resources_test/file1.tsv:

# this is a header      
# this is also a header     
one 0.11    123
two 0.23    456
three   0.35    789
four    0.47    123

Contents of resources_test/file2.tsv:

# this is not a header
# just kidding yes it is
eins    0.111   234
zwei    0.222   234
drei    0.333   123
vier    0.444   123

Finally, we also need to create a params.yaml file to specify the input files for the pipeline:

Contents of resources_test/params.yaml:

param_list:
  - id: file1
    input: resources_test/file1.tsv
  - id: file2
    input: resources_test/file2.tsv

Now run the pipeline:

nextflow run viash-io/viash_project_template \
  -main-script target/nextflow/template/workflow/main.nf \
  -r build/main \
  -latest \
  -profile docker \
  -params-file resources_test/params.yaml \
  --publish_dir output
Output
�[33mNextflow 24.04.3 is available - Please consider updating your version to it�(B�[m
N E X T F L O W  ~  version 23.10.0
Pulling viash-io/viash_project_template ...
 Fast-forward
Launching `https://github.com/viash-io/viash_project_template` [golden_kalman] DSL2 - revision: d02c1ce592 [build/main]
[fd/a3b85a] Submitted process > workflow:run_wf:remove_comments:processWf:remove_comments_process (file1)
[77/b5f28c] Submitted process > workflow:run_wf:remove_comments:processWf:remove_comments_process (file2)
[ab/cd4194] Submitted process > workflow:run_wf:take_column:processWf:take_column_process (file2)
[66/ec3197] Submitted process > workflow:run_wf:take_column:processWf:take_column_process (file1)
[f8/f6997e] Submitted process > workflow:run_wf:combine_columns:processWf:combine_columns_process (combined)
[74/1f9dde] Submitted process > workflow:publishStatesSimpleWf:publishStatesProc (combined)

If you have a Seqera Cloud compute environment already set up, you can also launch the workflow there:

cat > params.yaml <<EOF
param_list:
  - id: file1
    input: s3://my-bucket/file1.tsv
  - id: file2
    input: s3://my-bucket/file2.tsv
publish_dir: s3://my-bucket/output
EOF

tw launch viash-io/viash_project_template \
  --main-script target/nextflow/template/workflow/main.nf \
  --revision build/main \
  --pull-latest \
  --workspace 123456789 \
  --compute-env ABCDEFGHIJKLMNOP \
  --params-file params.yaml

Extending this template

This template is a great starting point for building your own Viash project. Here’s how you can extend it.

Step 1: Get the template

First create a new repository by clicking the “Use this template” button. If you can’t see the “Use this template” button, log into GitHub first.

Next, clone the repository using the following command.

git clone https://github.com/youruser/my_first_pipeline.git && cd my_first_pipeline

Your new repository should contain the following files:

tree my_first_pipeline
Output
.
├── CHANGELOG.md
├── LICENSE.md
├── main.nf
├── nextflow.config
├── README.md
├── README.qmd
├── resources_test
│   ├── file1.tsv
│   ├── file2.tsv
│   └── params.yaml
├── src
│   └── template
│       ├── combine_columns
│       │   ├── config.vsh.yaml
│       │   ├── script.R
│       │   └── test.R
│       ├── remove_comments
│       │   ├── config.vsh.yaml
│       │   ├── script.sh
│       │   └── test.sh
│       ├── take_column
│       │   ├── config.vsh.yaml
│       │   ├── script.py
│       │   └── test.py
│       └── workflow
│           ├── config.vsh.yaml
│           └── main.nf
└── _viash.yaml

Step 2: Build the Viash components

With Viash you can turn the components in src/ into Dockerized Nextflow modules by running:

viash ns build --setup cachedbuild --parallel
Output
Exporting take_column (template) =executable=> target/executable/template/take_column
Exporting remove_comments (template) =executable=> target/executable/template/remove_comments
Exporting workflow (template) =nextflow=> target/nextflow/template/workflow
Exporting combine_columns (template) =executable=> target/executable/template/combine_columns
Exporting take_column (template) =nextflow=> target/nextflow/template/take_column
Exporting remove_comments (template) =nextflow=> target/nextflow/template/remove_comments
Exporting combine_columns (template) =nextflow=> target/nextflow/template/combine_columns
[notice] Building container 'ghcr.io/viash-io/project_template/template/combine_columns:dev' with Dockerfile
[notice] Building container 'ghcr.io/viash-io/project_template/template/remove_comments:dev' with Dockerfile
[notice] Building container 'ghcr.io/viash-io/project_template/template/take_column:dev' with Dockerfile
All 7 configs built successfully

This command not only transforms the Viash components in src/ to Nextflow modules but it also builds the containers when appropriate (starting from the Docker cache when available using the cachedbuild argument). Once everything is built, a new target directory has been created containing the executables and modules grouped per platform:

tree target
Output
target
├── executable
│   └── template
│       ├── combine_columns
│       │   └── combine_columns
│       ├── remove_comments
│       │   └── remove_comments
│       └── take_column
│           └── take_column
└── nextflow
    └── template
        ├── combine_columns
        │   ├── main.nf
        │   └── nextflow.config
        ├── remove_comments
        │   ├── main.nf
        │   └── nextflow.config
        ├── take_column
        │   ├── main.nf
        │   └── nextflow.config
        └── workflow
            ├── main.nf
            └── nextflow.config

12 directories, 11 files

Step 3: Run the pipeline

You can now run the locally built pipeline using the following command:

nextflow run . \
  -main-script target/nextflow/template/workflow/main.nf \
  -profile docker \
  -params-file resources_test/params.yaml \
  --publish_dir output
Output
�[33mNextflow 24.04.3 is available - Please consider updating your version to it�(B�[m
N E X T F L O W  ~  version 23.10.0
Launching `target/nextflow/template/workflow/main.nf` [distracted_williams] DSL2 - revision: bbc6ad6ba4
[5f/28124e] Submitted process > workflow:run_wf:remove_comments:processWf:remove_comments_process (file1)
[fa/45bf29] Submitted process > workflow:run_wf:remove_comments:processWf:remove_comments_process (file2)
[e0/cf7ba0] Submitted process > workflow:run_wf:take_column:processWf:take_column_process (file1)
[1d/d36294] Submitted process > workflow:run_wf:take_column:processWf:take_column_process (file2)
[3f/d80ba4] Submitted process > workflow:run_wf:combine_columns:processWf:combine_columns_process (combined)
[43/7008a3] Submitted process > workflow:publishStatesSimpleWf:publishStatesProc (combined)

This will run the different stages of the workflow , with the final result result being stored in a file named run.combine_columns.output in the output directory output:

cat output/combined.workflow.output.tsv
Output
"1" 0.11    0.111
"2" 0.23    0.222
"3" 0.35    0.333
"4" 0.47    0.444

What’s next?

Congratulations, you’ve reached the end of this quickstart tutorial, and we’re excited for you to delve deeper into the world of Viash! Our comprehensive guide and reference documentation is here to help you explore various topics, such as:

So, get ready to enhance your skills and create outstanding solutions with Viash!