DiagrammeR: An R repository from jjallaire

With the DiagrammeR package, you can create diagrams and flowcharts using R. Markdown-like text is used to describe a diagram and, by doing this in R, we can also add some R code into the mix and integrate these diagrams in the R console, through R Markdown, and in shiny apps. Want a more visual intro? View a really sweet video by following the link below to Dailymotion (click/tap the image to undergo transport).

Introduction

The package leverages the infrastructure provided by htmlwidgets to bridge R and both mermaid.js and viz.js.

Installation

Install the development version of DiagrammeR from GitHub using the devtools package.

devtools::install_github('rich-iannone/DiagrammeR')

Graphviz Graphs

It's possible to make diagrams using the Graphviz support included in the DiagrammeR package. The processing function is called grViz. What you pass into grViz is a valid graph in the DOT language. The text can either exist in the form of a string, a reference to a Graphviz file (with a .gv file extension), or as a text connection.

Defining a Graphviz Graph

The Graphviz graph specification must begin with a directive stating whether a directed graph (digraph) or an undirected graph (graph) is desired. Semantically, this indicates whether or not there is a natural direction from one of the edge's nodes to the other. An optional graph ID follows this and paired curly braces denotes the body of the statement list (stmt_list).

Optionally, A graph may also be described as strict. This forbids the creation of multi-edges, i.e., there can be at most one edge with a given tail node and head node in the directed case. For undirected graphs, there can be at most one edge connected to the same two nodes. Subsequent edge statements using the same two nodes will identify the edge with the previously defined one and apply any attributes given in the edge statement.

Here is the basic structure:

[strict] (graph | digraph) [ID] '{' stmt_list '}'

Statements

The graph statement (graph_stmt), the node statement (node_stmt), and the edge statement (edge_stmt) are the three most commonly used statements in the Graphviz DOT language. Graph statements allow for attributes to be set for all components of the graph. Node statements define and provide attributes for graph nodes. Edge statements specify the edge operations between nodes and they supply attributes to the edges. For the edge operations, a directed graph must specify an edge using the edge operator -> while a undirected graph must use the -- operator.

Within these statements follow statement lists. Thus for a node statement, a list of nodes is expected. For an edge statement, a list of edge operations. Any of the list item can optionally have an attribute list (attr_list) which modify the attributes of either the node or edge.

Comments may be placed within the statement list. These can be marked using // or a /* */ structure. Comment lines are denoted by a # character. Multiple statements within a statement list can be separated by linebreaks or ; characters between multiple statements.

Here is an example where nodes (in this case styled as boxes and circles) can be easily defined along with their connections:

boxes_and_circles <- "
digraph boxes_and_circles {
  
  # several 'node' statements
  node [shape = box]
    A; B; C; D; E; F
  
  node [shape = circle,
        fixedsize = true,
        width = 0.9] // sets as circles
    1; 2; 3; 4; 5; 6; 7; 8

  # several 'edge' statements
    A->1; B->2; B->3; B->4; C->A
    1->D; E->A; 2->4; 1->5; 1->F
    E->6; 4->6; 5->7; 6->7; 3->8

  # a 'graph' statement
  graph [overlap = true, fontsize = 10]
}
"

grViz(boxes_and_circles)

The attributes of the nodes and the edges can be easily modified. In the following, colors can be selectively changed in attribute lists.

boxes_and_circles <- "
digraph boxes_and_circles {
  
  # several 'node' statements
  node [shape = box,
        color = blue] // for the letter nodes, use box shapes
    A; B; C; D; E
    F [color = black]
  
  node [shape = circle,
        fixedsize = true,
        width = 0.9] // sets as circles
    1; 2; 3; 4; 5; 6; 7; 8

  # several 'edge' statements
  edge [color = gray] // this sets all edges to be gray (unless overridden)
    A->1; B->2
    B->3 [color = red]
    B->4
    C->A [color = green]
    1->D; E->A; 2->4; 1->5; 1->F
    E->6; 4->6; 5->7; 6->7
    3->8 [color = blue]

  # a 'graph' statement
  graph [overlap = true, fontsize = 10]
}
"

grViz(boxes_and_circles)

There are many more attributes. Here are the principal node attributes:

Node Attribute	Description	Default
`color`	the node shape color	`black`
`colorscheme`	the scheme for interpreting color names
`distortion`	node distortion for any `shape = polygon`	`0.0`
`fillcolor`	node fill color	`lightgrey/black`
`fixedsize`	label text has no affect on node size	`false`
`fontcolor`	the font color	`black`
`fontname`	the font family	`Times-Roman`
`fontsize`	the point size of the label	`14`
`group`	the name of the node's horizontal alignment group
`height`	the minimum height in inches	`0.5`
`image`	the image file name
`labelloc`	the node label vertical alignment	`c`
`margin`	the space around a label	`0.11, 0.55`
`orientation`	the node rotation angle	`0.0`
`penwidth`	the width of the pen (in point size) for drawing boundaries	`1.0`
`peripheries`	the number of node boundaries
`shape`	the shape of the node	`ellipse`
`sides`	the number of sides for `shape = polygon`	`4`
`skew`	the skewing of the node for `shape = polygon`	`0.0`
`style`	graphics options for the node
`tooltip`	the tooltip annotation for the node	[node label]
`width`	the minimum width in inches	`0.75`

The edge attributes:

Edge Attribute	Description	Default
`arrowhead`	style of arrowhead at head end	normal
`arrowsize`	scaling factor for arrowheads	`1.0`
`arrowtail`	sytle of arrowhead at tail end	normal
`color`	edge stroke color	`black`
`colorscheme`	the scheme for interpreting color names
`constraint`	whether edge should affect node ranking	true
`decorate`	setting this draws line between labels with their edges
`dir`	direction; either `forward`, `back`, `both`, or `none`	`forward`
`edgeURL`	URL attached to non-label part of edge
`edgehref`	same as `edgeURL` attribute
`edgetarget`	if an URL is set, this determines the browser window for URL
`edgetooltip`	a tooltip annotation for the non-label part of edge	label
`fontcolor`	the font color	`black`
`fontname`	the font family	`Times-Roman`
`fontsize`	the point size of the label	`14`
`headclip`	if false, edge is not clipped to head node boundary	true
`headhref`	same as `headURL`
`headlabel`	label placed near head of edge
`headport`	can be either: `n`, `ne`, `e`, `se`, `s`, `sw`, `w`, `nw`
`headtarget`	if `headURL` is set, determines the browser window for URL
`headtooltip`	a tooltip annotation near head of edge	label
`headURL`	URL attached to head label
`href`	alias for URL
`id`	any string (user-defined output object tags)
`label`	edge label
`labelangle`	angle in degrees which head or tail label is rotated off edge	`-25.0`
`labeldistance`	scaling factor for distance of head or tail label from node	`1.0`
`labelfloat`	lessen constraints on edge label placement	false
`labelfontcolor`	typeface color for head and tail labels	`black`
`labelfontname`	font family for head and tail labels	`Times-Roman`
`labelfontsize`	point size for head and tail labels	`14`
`labelhref`	same as `labelURL`
`labelURL`	URL for label, overrides `edgeURL`
`labeltarget`	if `URL` or `labelURL` set, determines browser window for URL
`labeltooltip`	tooltip annotation near label	label
`layer`	`all`, id or id:id, or a comma-separated list	overlay range
`lhead`	name of cluster to use as head of edge
`ltail`	name of cluster to use as tail of edge
`minlen`	minimum rank distance between head and tail	`1`
`penwidth`	width of pen for drawing edge stroke, in points	`1.0`
`samehead`	tag for head node; edge heads with the same tag are merged onto the same port
`sametail`	tag for tail node; edge tails with the same tag are merged onto the same port
`style`	graphics options
`tailclip`	if false, edge is not clipped to tail node boundary	true
`tailhref`	same as `tailURL`
`taillabel`	label placed near tail of edge
`tailport`	can be either: `n`, `ne`, `e`, `se`, `s`, `sw`, `w`, `nw`
`tailtarget`	if `tailURL` is set, determines browser window for URL
`tailtooltip`	tooltip annotation near tail of edge	label
`tailURL`	URL attached to tail label
`target`	if `URL` is set, determines browser window for URL
`tooltip`	tooltip annotation	label
`weight`	integer cost of stretching an edge	`1`

The graph attributes:

Graph Attribute	Description	Default
`aspect`	controls aspect ratio adjustment
`bgcolor`	background color for drawing and initial fill color
`center`	center drawing	false
`clusterrank`	`local` but optionally `global` or `none`	`local`
`color`	the color for clusters, outline color, and fill color	`black`
`colorscheme`	the scheme for interpreting color names
`compound`	allow edges between clusters	false
`concentrate`	enables edge concentrators	false
`dpi`	dpi for image output	96
`fillcolor`	cluster fill color	`black`
`fontcolor`	typeface color	`black`
`fontname`	font family	`Times-Roman`
`fontpath`	list of directories to search for paths
`fontsize`	point size of label	`14`
`id`	any string (user-defined output object tags)
`label`	any string
`labeljust`	label justification; `l` or `r` for left or right	centered
`labelloc`	label location; `t` or `b` for top or bottom	top
`landscape`	graph orientation; `true` for landscape
`layers`	id:id:id...
`layersep`	specifies separator character to split `layers`	`:`
`margin`	margin (in inches) included in `page`	`0.5`
`mindist`	minimum separation (in inches) between all nodes	`1.0`
`nodesep`	separation (in inches) between nodes	`0.25`
`nojustify`	justify to label if set as true	false
`ordering`	if `out` edge order is preserved
`orientation`	if `rotate` is not used and the value is `landscape`, then landscape	`portrait`
`outputorder`	or `nodesfirst`, `edgesfirst`	breadthfirst
`page`	unit of pagination (e.g., "`8.5,11`")
`pagedir`	traversal order of pages	`BL`
`pencolor`	color for drawing cluster boundaries	`black`
`penwidth`	width of pen, in points, for drawing boundaries	`1.0`
`peripheries`	number of cluster boundaries	`1`
`rank`	choices are: `same`, `min`, `max`, `source` or `sink`
`rankdir`	choices are: `LR` (left to right) or `TB` (top to bottom)	`TB`
`ranksep`	separation between ranks, in inches	`0.75`
`ratio`	approximate aspect ratio desired: `fill` or `auto`
`rotate`	if set to `90`, set orientation to landscape
`samplepoints`	number of points used to represent ellipses and circles on output	`8`
`searchsize`	maximum edges with negative cut values to check when looking for a minimum one during network simplex	`30`
`size`	maximum drawing size, in inches
`splines`	draw edges as splines, polylines, lines
`style`	graphics options for clusters (e.g., `filled`)
`stylesheet`	pathname or URL to XML style sheet for SVG
`target`	if `URL` is set, determines browser window for URL
`tooltip`	tooltip annotation for cluster	label
`truecolor`	if set, force 24-bit or indexed color in image output
`URL`	URL associated with graph (format-dependent)
`viewport`	clipping window on output

Graphviz Engines

Several Graphviz engines are available with DiagrammeR for rendering graphs. By default, the grViz function renders graphs using the standard dot engine. However, the neato, twopi, and circo engines are selectable by supplying those names to the engine argument. The neato engine provides spring model layouts. This is a suitable engine if the graph is not too large (<100 nodes) and you don't know anything else about it. The neato engine attempts to minimize a global energy function, which is equivalent to statistical multi-dimensional scaling. The twopi engine provides radial layouts. Nodes are placed on concentric circles depending their distance from a given root node. The circo engine provide circular layouts. This is suitable for certain diagrams of multiple cyclic structures, such as certain telecommunications networks.

Here is how the 'boxes_and_circles' graph is rendered with the neato engine:

grViz(boxes_and_circles, engine = "neato")

grViz(boxes_and_circles, engine = "twopi")

grViz(boxes_and_circles, engine = "circo")

Mixing R and Graphviz DOT

Possibilities are interesting when combining R functions with DiagrammeR and the grViz function. Here's an example of how the rvest package and piping with pipeR can yield multiple graphs:

library(rvest)
library(XML)
library(pipeR)

# Generate all the examples from viz.js GitHub repo
html("https://raw.githubusercontent.com/mdaines/viz.js/gh-pages/example.html") %>>%
  html_nodes("script[type='text/vnd.graphviz']") %>>%
  lapply(
    function(x){
      xmlValue(x) %>>% (~ htmltools::html_print(grViz(.)) ) %>>% grViz
    }
  )

Isn't this great? Let's take in some examples straight from the Graphviz gallery:

readLines("http://www.graphviz.org/Gallery/directed/fsm.gv.txt") %>>%
  grViz

readLines("http://www.graphviz.org/Gallery/directed/Genetic_Programming.gv.txt") %>>%
  grViz

readLines("http://www.graphviz.org/Gallery/directed/unix.gv.txt") %>>%
  grViz

You get some nice figures as a result. Try 'em, you'll see.

For much more information on the DOT language, see the excellent drawing graphs with dot manual.

Mermaid Graphs

The mermaid function processes the specification of a diagram and then renders the diagram. This diagram spec can either exist in the form of a string, a reference to a mermaid file (with a .mmd file extension), or as a connection.

The mermaid-style graph specification begins with a declaration of graph followed by the graph direction. The directions can be:

LR left to right
RL right to left
TB top to bottom
BT bottom to top
TD top down (same as TB)

Nodes can be given arbitrary ID values and those IDs are displayed as text within their respective boxes. Connections between nodes are denoted by:

--> arrow connection
--- line connection

Simply joining up a series of nodes in a left-to-right graph can be done in a few lines:

diagram <- "
graph LR
  A-->B
  A-->C
  C-->E
  B-->D
  C-->D
  D-->F
  E-->F
"

mermaid(diagram)

This renders the following image:

The same result can be achieved in a more succinct manner with this R statement (using semicolons between statements in the mermaid diagram spec):

mermaid("graph LR; A-->B; A-->C; C-->E; B-->D; C-->D; D-->F; E-->F")

Alternatively, here is the result of using the statement graph TB in place of graph LR:

Keep in mind that external files can also be called by the mermaid function. The file graph.mmd can contain the text of the diagram spec as follows

graph LR
  A-->B
  A-->C
  C-->E
  B-->D
  C-->D
  D-->F
  E-->F

and be rendered through:

mermaid("graph.mmd")

Alright, here's another example. This one places some text inside the diagram objects. Also, there are some CSS styles to add a color fill to each of the diagram objects:

diagram <- "
graph LR
A(Rounded)-->B[Squared]
B-->C{A Decision}
C-->D[Square One]
C-->E[Square Two]
    
style A fill:#DCEBE3
style B fill:#77DFC9
style C fill:#DEDBBA
style D fill:#F8F0CC
style E fill:#FCFCF2
"
    
mermaid(diagram)

What you get is this:

Here's an example with line text (that is, text appearing on connecting lines). Simply place text between pipe characters, just after the arrow, right before the node identifier. There are few more CSS properties for the boxes included in this example (stroke, stroke-width, and stroke-dasharray).

diagram <- "
graph LR
A(Start)-->|Line Text|B(Keep Going)
B-->|More Line Text|C(Stop)
    
style A fill:#A2EB86, stroke:#04C4AB, stroke-width:2px
style B fill:#FFF289, stroke:#FCFCFF, stroke-width:2px, stroke-dasharray: 4, 4
style C fill:#FFA070, stroke:#FF5E5E, stroke-width:2px
"

mermaid(diagram)

The resultant graphic:

Let's include the values of some R objects into a fresh diagram. The mtcars dataset is something I go to again and again, so, I'm going to load it up.

data(mtcars)

When you call the R summary function on this data frame, you obtain this:

     mpg             cyl             disp             hp             drat      
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0   Median :3.695  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7   Mean   :3.597  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0   Max.   :4.930  
       wt             qsec             vs               am              gear      
 Min.   :1.513   Min.   :14.50   Min.   :0.0000   Min.   :0.0000   Min.   :3.000  
 1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000  
 Median :3.325   Median :17.71   Median :0.0000   Median :0.0000   Median :4.000  
 Mean   :3.217   Mean   :17.85   Mean   :0.4375   Mean   :0.4062   Mean   :3.688  
 3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000  
 Max.   :5.424   Max.   :22.90   Max.   :1.0000   Max.   :1.0000   Max.   :5.000  
      carb      
 Min.   :1.000  
 1st Qu.:2.000  
 Median :2.000  
 Mean   :2.812  
 3rd Qu.:4.000  
 Max.   :8.000

That information can placed into a diagram. First, we'll get a vector object for strings that specify each of the connections and the text inside the boxes (one for each mtcars dataset column). These strings will contain each of the statistics provided by the summary function (minimum, 1st quartile, median, mean, 3rd quartile, and maximum). We'll use a sapply to loop through each column.

connections <- sapply(
  1:ncol(mtcars)
  , function(i){
    paste0(
      i
      , "(", colnames(mtcars)[i], ")---"
      , i, "-stats("
      , paste0(
        names(summary(mtcars[,i]))
        , ": "
        , unname(summary(mtcars[,i]))
        , collapse="<br/>"
      )
      , ")"
    )
  }
)

This generates all of the syntax required for connections between column names to the statistical summary text in each of the adjoining boxes. Notice the use of the <br/> tag that terminates each of the stats inside the paste0 statement. They provide the necessary linebreaks for text within each diagram object.

Now, to generate the code for the summary diagram, one can use a paste0 statement and then a separate paste statement for the connection text (with the collapse argument set to \n to specify a linebreak for the output text). Note that within the paste0 statement, there is a \n linebreak wherever you would need one. Finally, to style multiple objects, a classDef statement was used. Here, a class of type column was provided with values for certain CSS properties. On the final line, the class statement applied the class definition to nodes 1 through 11 (a comma-separated list generated by the paste0 statement).

diagram <-
paste0(
"graph TD;", "\n",
paste(connections, collapse = "\n"), "\n",
"classDef column fill:#0001CC, stroke:#0D3FF3, stroke-width:1px;" ,"\n",
"class ", paste0(1:length(connections), collapse = ","), " column;
")

mermaid(diagram)

This is part of the resulting graphic (it's quite wide so I'm displaying just 8 of the 11 columns):

The mermaid.js library also supports sequence diagrams. The "How to Draw Sequence Diagrams" report by Poranen, Makinen, and Nummenmaa offers a good introduction to sequence diagrams. Let's replicate the ticket-buying example from Figure 1 of this report and add in some conditionals.

# Using this "How to Draw a Sequence Diagram" 
# http://www.cs.uku.fi/research/publications/reports/A-2003-1/page91.pdf
# draw some sequence diagrams with DiagrammeR

mermaid("
sequenceDiagram
  customer->>ticket seller: ask ticket
  ticket seller->>database: seats
  alt tickets available
    database->>ticket seller: ok
    ticket seller->>customer: confirm
    customer->>ticket seller: ok
    ticket seller->>database: book a seat
    ticket seller->>printer: print ticket
  else sold out
    database->>ticket seller: none left
    ticket seller->>customer: sorry
  end
")

For more examples and additional documentation, see the mermaid.js Wiki.

DiagrammeR + shiny

As with other htmlwidgets, we can easily dynamically bind DiagrammeR in R with shiny. Here is a quick example where we can provide a diagram spec in a textInput.

library(shiny)

ui = shinyUI(fluidPage(
  textInput('spec', 'Diagram Spec', value = ""),
  DiagrammeROutput('diagram')
))

server = function(input, output){
  output$diagram <- renderDiagrammeR(DiagrammeR(
    input$spec
  ))
}

shinyApp(ui = ui, server = server)

jjallaire/DiagrammeR