Nxfvars makes it easy to parameterize Jupyter notebooks, Rmarkdown notebooks, or plain
Python scripts from a Nextflow process. All variables accessible in
a process's script
section are made available directly in the notebook.
Download nxfvars.nf and add the script to your pipeline. Import the nxfvars function and call it from the script section of your process:
nextflow.enable.dsl = 2
include { nxfvars } from "./nxfvars.nf"
process foo {
script:
"""
${nxfvars(task)}
# run script or execute notebook here
"""
}
When the process is executed, nxfvars generates a .params.yml
file
in the work directory. It contains all variables that can be accessed in the script
section. The YAML-file can be consumed by the nxfvars Python library,
Papermill,
or any YAML parser (see below).
Full examples at examples/nxfvars_python_script and examples/nxfvars_python_notebook.
The nxfvars Python library is a thin wrapper around a YAML parser. It may be used from both Jupyter notebooks or plain Python scripts. You can install it using pip:
pip install nxfvars
In python, nextflow variables can be accessed through the nxfvars
object:
from nxfvars import nxfvars
print(nxfvars["foo"])
print(nxfvars["params"]["bar"])
print(nxfvars["task"]["cpus"])
It is common to execute notebooks interactively during development and run them later
with parameters. In that case you can use .get()
to obtain default values,
when a .params.yml
is not yet present
nxfvars.get("foo", "default value for development")
From nextflow, just invoke the python script, or use e.g. jupyter nbconvert
to
execute the notebook.
nxfvars execute
is a convenient wrapper around jupytext
and jupyter nbconvert to execute and
convert arbitrary jupytext notebook formats to a html report.
process nxfvars_python {
script:
"""
${nxfvars(task)}
# simply execute the script here
python my_script.py
# or execute the notebook
nxfvars execute notebook.ipynb report.html
"""
}
Full example at examples/papermill
Papermill is an established library for parameterizing jupyter notebooks. It can readily consume yaml files generated with nxfvars.
process papermill {
output:
file("report.html), emit: report
script:
"""
${nxfvars(task)}
papermill some_notebook.ipynb notebook_executed.ipynb -f .params.yml -k python3
# optional: convert to HTML report
jupyter nbconvert --to html --output report.html notebook_executed.ipynb
"""
}
Full example at examples/rmarkdown
For now, we use the following R snippet (render.R
) to parse the yaml file and
render the notebook with rmarkdown
. This could be facilitated in the future by
porting the nxfvars library to R.
# USAGE: render.R notebook.Rmd report.html
args = commandArgs(trailingOnly=TRUE)
nxfvars = list(nxfvars = yaml::read_yaml('.params.yml'))
rmarkdown::render(args[1], params = nxfvars, output_file=args[2])
process rmarkdown {
stageInMode "copy" // work around https://github.com/rstudio/rmarkdown/issues/1508
output:
file("report.html"), emit: report
script:
"""
${nxfvars(task)}
render.R 'notebook.Rmd' 'report.html'
"""
}
All variables in a nextflow process (except local variables declared with def
) can be
programmatically accessed through Nextflow's implicit variables this
and task
.
See also my blog post
about these variables.
The nxvfars(task)
function encodes all variables as YAML and injects them into the
bash script.