Nxfvars: Parameterize Notebooks from Nextflow

Nxfvars makes it easy to parameterize Jupyter notebooks, Rmarkdown notebooks, or plain Python scripts from a Nextflow process. All variables accessible in a process's script section are made available directly in the notebook.

Using nxfvars in a Nextflow pipeline

Download nxfvars.nf and add the script to your pipeline. Import the nxfvars function and call it from the script section of your process:

nextflow.enable.dsl = 2
include { nxfvars } from "./nxfvars.nf"

process foo {
    script:
    """
    ${nxfvars(task)}

    # run script or execute notebook here
    """
}

When the process is executed, nxfvars generates a .params.yml file in the work directory. It contains all variables that can be accessed in the script section. The YAML-file can be consumed by the nxfvars Python library, Papermill, or any YAML parser (see below).

Usage with the nxfvars Python library

Full examples at examples/nxfvars_python_script and examples/nxfvars_python_notebook.

The nxfvars Python library is a thin wrapper around a YAML parser. It may be used from both Jupyter notebooks or plain Python scripts. You can install it using pip:

pip install nxfvars

In python, nextflow variables can be accessed through the nxfvars object:

from nxfvars import nxfvars

print(nxfvars["foo"])
print(nxfvars["params"]["bar"])
print(nxfvars["task"]["cpus"])

It is common to execute notebooks interactively during development and run them later with parameters. In that case you can use .get() to obtain default values, when a .params.yml is not yet present

nxfvars.get("foo", "default value for development")

From nextflow, just invoke the python script, or use e.g. jupyter nbconvert to execute the notebook.

nxfvars execute is a convenient wrapper around jupytext and jupyter nbconvert to execute and convert arbitrary jupytext notebook formats to a html report.

process nxfvars_python {
    script:
    """
    ${nxfvars(task)}

    # simply execute the script here
    python my_script.py
    # or execute the notebook
    nxfvars execute notebook.ipynb report.html
    """
}

Usage with Papermill

Full example at examples/papermill

Papermill is an established library for parameterizing jupyter notebooks. It can readily consume yaml files generated with nxfvars.

process papermill {

    output:
        file("report.html), emit: report

    script:
    """
    ${nxfvars(task)}

    papermill some_notebook.ipynb notebook_executed.ipynb -f .params.yml -k python3
    # optional: convert to HTML report
    jupyter nbconvert --to html --output report.html notebook_executed.ipynb
    """
}

Usage with Rmarkdown

Full example at examples/rmarkdown

For now, we use the following R snippet (render.R) to parse the yaml file and render the notebook with rmarkdown. This could be facilitated in the future by porting the nxfvars library to R.

# USAGE: render.R notebook.Rmd report.html
args = commandArgs(trailingOnly=TRUE)
nxfvars = list(nxfvars = yaml::read_yaml('.params.yml'))
rmarkdown::render(args[1], params = nxfvars, output_file=args[2])

process rmarkdown {
    stageInMode "copy" // work around https://github.com/rstudio/rmarkdown/issues/1508
    output:
        file("report.html"), emit: report

    script:
    """
    ${nxfvars(task)}

    render.R 'notebook.Rmd' 'report.html'
    """
}

How it works

All variables in a nextflow process (except local variables declared with def) can be programmatically accessed through Nextflow's implicit variables this and task. See also my blog post about these variables.