Project links: Documentation & Main Website | Issue Tracker | Mailing List
- NEW blog post: Provenance reports in Scientific Workflows - going into details about how SciPipe is addressing provenance
- NEW blog post: First production workflow run with SciPipe
- NEW video: Watch a screencast on how to write a Hello World workflow in SciPipe [15:28]
SciPipe is a library for writing Scientific Workflows, sometimes also called "pipelines", in the Go programming language.
When you need to run many commandline programs that depend on each other in complex ways, SciPipe helps by making the process of running these programs flexible, robust and reproducible. SciPipe also lets you restart an interrupted run without over-writing already produced output and produces an audit report of what was run, among many other things.
SciPipe is built on the proven principles of Flow-Based Programming (FBP) to achieve maximum flexibility, productivity and agility when designing workflows. Compared to plain dataflow, FBP provides the benefits that processes are fully self-contained, so that a library of re-usable components can be created, and plugged into new workflows ad-hoc.
Similar to other FBP systems, SciPipe workflows can be likened to a network of assembly lines in a factory, where items (files) are flowing through a network of conveyor belts, stopping at different independently running stations (processes) for processing, as depicted in the picture above.
SciPipe was initially created for problems in bioinformatics and cheminformatics, but works equally well for any problem involving pipelines of commandline applications.
Project status: SciPipe is still alpha software and minor breaking API changes still happens as we try to streamline the process of writing workflows. Please follow the commit history closely for any API updates if you have code already written in SciPipe (Let us know if you need any help in migrating code to the latest API).
Some key benefits of SciPipe, that are not always found in similar systems:
- Intuitive behaviour: SciPipe operates by flowing data (files) through a network of channels and processes, not unlike the conveyor belts and stations in a factory.
- Flexible: Processes that wrap command-line programs or scripts, can be combined with processes coded directly in Golang.
- Custom file naming: SciPipe gives you full control over how files are named, making it easy to find your way among the output files of your workflow.
- Portable: Workflows can be distributed either as Go code to be run with
go run
, or as stand-alone executable files that run on almost any UNIX-like operating system. - Easy to debug: As everything in SciPipe is just Go code, you can use some
of the available debugging tools, or just
println()
statements, to debug your workflow. - Supports streaming: Can stream outputs via UNIX FIFO files, to avoid temporary storage.
- Efficient and Parallel: Workflows are compiled into statically compiled code that runs fast. SciPipe also leverages pipeline parallelism between processes as well as task parallelism when there are multiple inputs to a process, making efficient use of multiple CPU cores.
- There are still a number of missing good-to-have features for workflow design. See the issue tracker for details.
- There is not (yet) support for the Common Workflow Language.
Let's look at an example workflow to get a feel for what writing workflows in SciPipe looks like:
package main
import (
// Import SciPipe into the main namespace (generally frowned upon but could
// be argued to be reasonable for short-lived workflow scripts like this)
. "github.com/scipipe/scipipe"
)
func main() {
// Init workflow
wf := NewWorkflow("hello_world")
// Initialize processes and set output file paths
hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
hello.SetPathStatic("out", "hello.txt")
world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
world.SetPathReplace("in", "out", ".txt", "_world.txt")
// Connect network
world.In("in").Connect(hello.Out("out"))
// Run workflow
wf.Run()
}
Let's put the code in a file named scipipe_helloworld.go
and run it:
$ go run scipipe_helloworld.go
AUDIT 2017/05/04 17:05:15 Task:hello Executing command: echo 'Hello ' > hello.txt.tmp
AUDIT 2017/05/04 17:05:15 Task:world Executing command: echo $(cat hello.txt) World >> hello_world.txt.tmp
Let's check what file SciPipe has generated:
$ ls -1tr hello*
hello.txt.audit.json
hello.txt
hello_world.txt
hello_world.txt.audit.json
As you can see, it has created a file hello.txt
, and hello_world.txt
, and
an accompanying .audit.json
for each of these files.
Now, let's check the output of the final resulting file:
$ cat hello_world.txt
Hello World
Now we can rejoice that it contains the text "Hello World", exactly as a proper Hello World example should :)
You can find many more examples in the examples folder in the GitHub repo.
For more information about how to write workflows using SciPipe, and much more, see SciPipe website (scipipe.org)!
- See a poster on SciPipe, presented at the e-Science Academy in Lund, on Oct 12-13 2016.
- See slides from a recent presentation of SciPipe for use in a Bioinformatics setting.
- The architecture of SciPipe is based on an flow-based programming like pattern in pure Go presented in this and this blog posts on Gopher Academy.
- SciPipe is very heavily dependent on the proven principles form Flow-Based Programming (FBP), as invented by John Paul Morrison. From Flow-based programming, SciPipe uses the ideas of separate network (workflow dependency graph) definition, named in- and out-ports, sub-networks/sub-workflows and bounded buffers (already available in Go's channels) to make writing workflows as easy as possible.
- This library is has been much influenced/inspired also by the GoFlow library by Vladimir Sibirov.
- Thanks to Egon Elbre for helpful input on the design of the internals of the pipeline, and processes, which greatly simplified the implementation.
- This work is financed by faculty grants and other financing for the Pharmaceutical Bioinformatics group of Dept. of Pharmaceutical Biosciences at Uppsala University, and by Swedish Research Council through the Swedish National Bioinformatics Infrastructure Sweden.
- Supervisor for the project is Ola Spjuth.
Find below a few tools that are more or less similar to SciPipe that are worth worth checking out before deciding on what tool fits you best (in approximate order of similarity to SciPipe):