This package allows us to draw Sankey plots in Stata. It is based on the Sankey Guide (October 2021).
The package can be installed via SSC or GitHub. The GitHub version, might be more recent due to bug fixes, feature updates etc, and may contain syntax improvements and changes in default values. See version numbers below. Eventually the GitHub version is published on SSC.
SSC (v1.1):
ssc install sankey, replace
GitHub (v1.2):
net install sankey, from("https://raw.githubusercontent.com/asjadnaqvi/stata-sankey/main/installation/") replace
The palettes
package is required to run this command:
ssc install palettes, replace
ssc install colrspace, replace
Even if you have these packages installed, please check for updates: ado update, update
.
If you want to make a clean figure, then it is advisable to load a clean scheme. These are several available and I personally use the following:
ssc install schemepack, replace
set scheme white_tableau
You can also push the scheme directly into the graph using the scheme(schemename)
option. See the help file for details or the example below.
I also prefer narrow fonts in figures with long labels. You can change this as follows:
graph set window fontface "Arial Narrow"
The syntax for v1.2 is as follows:
sankey value [if] [in], from(var) to(var) by(var)
[
palette(str) colorby(layer|level) smooth(1-8) gap(num) recenter(mid|bot|top)
labangle(str) labsize(str) labposition(str) labgap(str) showtotal
valsize(str) valcondition(str) format(str) valgap(str) novalues
lwidth(str) lcolor(str) alpha(num offset(num)
title(str) subtitle(str) note(str) scheme(str) name(str) xsize(num) ysize(num)
]
See the help file help sankey
for details.
The most basic use is as follows:
sankey value, from(var1) to(var2) by(level variable)
where var1
and var2
are the string source and destination variables respectively against which the value
variable is plotted. The by()
variable defines the levels.
Get the example data from GitHub:
use "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey2.dta?raw=true", clear
Let's test the sankey
command:
sankey value, from(source) to(destination) by(layer)
sankey value, from(source) to(destination) by(layer) smooth(2)
sankey value, from(source) to(destination) by(layer) smooth(8)
sankey value, from(source) to(destination) by(layer) recenter(bot)
sankey value, from(source) to(destination) by(layer) recenter(top)
sankey value, from(source) to(destination) by(layer) gap(0)
sankey value, from(source) to(destination) by(layer) gap(20)
sankey value, from(source) to(destination) by(layer) noval showtot
sankey value, from(source) to(destination) by(layer) palette(CET C7)
sankey value, from(source) to(destination) by(layer) colorby(level)
sankey value, from(source) to(destination) by(layer) noval showtot palette(CET C6) ///
laba(0) labpos(3) labg(-1) offset(10)
import delim "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_simple.xlsx?raw=true"
sankey value, from(source) to(destination) by(layer) showtot
sankey value, from(source) to(destination) by(layer) palette(CET C7) ///
valcond(>100) valsize(1.6) showtotal ///
xsize(2) ysize(1) lc(white) lw(0.1)
Please open an issue to report errors, feature enhancements, and/or other requests.
v1.2 (02 Feb 2023)
- Unbalanced Sankey's are now allowed. This means that incoming and outgoing layers do not necessarily have to be equal. Outgoing can be larger than incoming.
- A category can now also start in the middle.
- Various bug fixes.
v1.1 (13 Dec 2022)
- Option
valformat()
renamed to justformat
. This aligns it with standard Stata usages. - A new option
offset()
added to displace x-axis on the right-hand side. Offset is given in percentage share of x-axis range. This allows rotated labels to be displaced properly. - Checks for missing bilateral flow combinations. Hitting a non-flow combo was causing the code to crash.
v1.0 (08 Dec 2022)
- Public release.