Nextflow can do SO much. Here only covers the very basics of the scripting, but not configuration which would be more user-specific.
Error reports and suggestions welcome!
- Nextflow patterns, official
- Gitter chat room
- Google group
- The old posts in these places are a treasure dump that answered 99% of my questions. As an example, the last function
.collect{ it[1] }
in the cheatsheet came from a post in Gitter by @Juke34
- Each execution of a process happens in its own temporary working directory.
- The working directory is the folder named like
/path_to_tmp/4d9c3b333734a5b63d66f0bc0cfcdc
that Nextflow points you to when there is an error in execution. This folder contains the error log that could be useful for debugging. One can find the folder path in the .nextflow.log or in the report.html. - This folder only contains files (usually in form of symlinks, see below) from the input channel, so it's isolated from the rest of the file system.
- This folder will also contain all output files (unless specifically directed elsewhere), and only those specified in the output channels and
publishDir
will be moved or copied to thepublishDir
. - Be mindful that if the
"""
script section"""
involves changing directory, such ascd
orrmarkdown::render( knit_root_dir = "folder/" )
, Nextflow will still only search the working directory for output files. - Run
nextflow clean -f
in the excecution folder to clean up the working directories.
- Throughout Nextflow scripts, one can use
${workflow.projectDir}
to refer to where the nextflow script (usually main.nf) locates. For example:publishDir "${workflow.projectDir}/output", mode: 'copy'
orRscript ${workflow.projectDir}/bin/task.R
.${workflow.launchDir}
to refer to where the script is called from.
- They are more reiable than
$PWD
or$pwd
in the script section.
Channel.from( "A.txt" )
will putA.txt
as is into the channelChannel.fromPath( "A.txt" )
will add a full path (usually current directory) and put/path/A.txt
into the channel.Channel.fromPath( "folder/A.txt" )
will add a full path (usually current directory) and put/path/folder/A.txt
into the channel.Channel.fromPath( "/path/A.txt" )
will put/path/A.txt
into the channel.- In other words,
Channel.fromPath
will only add a full path if there isn't already one and ensure there is always a full path in the resulting channel. - This goes hand in hand with
input: path("A.txt")
inside the process, where Nextflow actually creates a symlink namedA.txt
(note the path from first / to last / is stripped) linking to/path/A.txt
in the working directory, so it can be accessed within the working directory by the scriptcat A.txt
without specifying a path.
- With
input: path("A.txt")
one can refer to the file in the script asA.txt
. Side noteA.txt
doesn't have to be the same name as in channel creation, it can be anything,input: path("B.txt")
,input: path("n")
etc. - With
input: path(A)
one can refer to the file in the script as$A
input: path("A.txt")
andinput: path "A.txt"
generally both work. Occasionally had errors that required the following (tip from @danielecook):- if not in a tuple, use
input: path "A.txt"
- if in a tuple, use
input: tuple path("A.txt"), path("B.txt")
- if not in a tuple, use
- (from @pditommaso):
path(A)
is almost the same asfile(A)
, however the first interprets a value of type string as the input file path (ie the location in the file system where it's stored), the latter interprets a value of type string and materialise it to a temporary files. It's recommended the use ofpath
since it's less ambiguous and fits better in most use-cases.
- Non exhaustive list.
New version | Old version | Where it is used |
---|---|---|
.Channel.of( ) | .Channel.from( ) | channel creation |
.Channel.fromList( ) | .Channel.from( ) | channel creation |
tuple | set | input/output declaration inside of process |
.view( ) | .print( ) | channel operation |
.combine( ) | .spread( ) | channel operation |
NA | .merge( ) | channel operation |
.groupTuple( ) | .groupBy( ) | channel operation |
.join( ) | .phase( ) | channel operation |
- Moving to DSL2 is a one-way street. It's so intuitive with clean and readable code.
- In DSL1, each queue channel can only be used once.
- In DSL2, a channel can be fed into multiple processes
- In DSL2, each process can only be called once. The solution is either
.concat()
the input channels so they run as parallel processes, or put the process in a module and import multiple times from the module. - DSL2 also enforces that all inputs needs to be combined into 1 channel before it goes into a process. See the cheatsheet for useful operators.
- Simple steps to convert from original syntax to DSL2
- @danielecook for offering lots of help and advice.