ccdr

ccdr implements the CCDr structure learning algorithm described in [1]. Based on observational data, this algorithm estimates the structure of a Bayesian network (aka edges in a DAG) using penalized maximum likelihood based on L1 or concave (MCP) regularization.

Important note: This legacy package consists of a single method that implements the main algorithm and is not actively maintained. The main purpose of this repo is to serve as a reproducible snapshot for the simulations in [1].

Installation

Please note that this package is currently in a development state. In order to install the package, you will need both devtools and Rcpp. We are still working out the kinks of getting the package to compile on different systems, so please let us know if you have any issues.

If you have never installed packages using Rcpp before, we recommend checking out the following resources which should get you started:

Rcpp for Seamless R and C++ Integration
Rcpp: Seamless R and C++ Integration, Dirk Eddelbuettel, Romain Francois, Journal of Statistical Software, Vol. 40, Issue 8, Apr 2011.
High performance functions with Rcpp

The code is hosted on github and can be installed using devtools:

if (packageVersion("devtools") < 1.6) {
  install.packages("devtools")
}
if(devtools::find_rtools()) devtools::install_github("itsrainingdata/ccdr")

NOTE: Windows users will need to make sure that Rtools is installed in order to build this package. To check if you have Rtools installed, you can use devtools::find_rtools(). For more details: http://cran.r-project.org/bin/windows/Rtools/.

If you find any bugs, please report them here on github.

Examples

The basic usage is as follows:

### Specify dimensions
nn <- 20
pp <- 100

### Generate a random Gaussian matrix
dat <- matrix(rnorm(nn * pp), nrow = nn)

### Run the ccdr algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20)

### Display the results
print(ccdr.path)

The output of ccdr.run is an S3 object ccdrPath, which is essentially a list of estimates, one for each value of lambda in the solution path. Each estimate is an S3 object ccdrFit. The DAG itself is stored as an edge list (see documentation for ccdrFit-class and edgeList-class for more details).

This trivial example uses uncorrelated normal data, which is not very interesting. In order to do some interesting calculations, we first need to generate data according to some pre-specified DAG structure. The ccdr package does not provide this functionality: To generate data from a given Bayesian network and/or simulate random networks, the following R packages are recommended:

bnlearn: bnlearn on CRAN, www.bnlearn.com
pcalg: pcalg on CRAN
igraph: igraph on CRAN, http://igraph.org/r/

Example using `pcalg`

The pcalg package provides two useful functions: randomDAG for generating a random ordered DAG and rmvDAG for generating random data according to the structural equation model implied by the DAG.

NOTE: Since randomDAG produces an ordered DAG, we should permute the columns in the simulated dataset in order to obfuscate this information. If we know this ordering in advance, there are much better methods for estimating the DAG structure using autoregressive models.

library("pcalg")

### Set up the model parameters
nn <- 20                # How many samples to draw?
pp <- 100               # How many nodes in the DAG?
num.edges <- 100        # How many *expected* edges in the DAG?
ss <- num.edges / pp    # This is the expected number of parents *per node*

### Generate a random DAG using the pcalg method randomDAG
beta.min <- 0.5
beta.max <- 2
edge.pr <- 2 * ss / (pp - 1)
g <- randomDAG(n = pp, prob = edge.pr, lB = beta.min, uB = beta.max) # Note that the edge weights are selected at random here!

### Generate random data according to this DAG using the method rmvDAG
dat <- rmvDAG(n = nn, dag = g, errDist = "normal")
dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering

### Run the algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)

Example using `bnlearn`

The bnlearn package provides methods for reading in data from the Bayesian Network Repository. We can use these methods along with the pcalg function rmvDAG to generate random data from these structures:

library("bnlearn")
library("graph")
library("pcalg")

### Download the RDA file containing bnlearn-compatible data from the repository
con <- url("http://www.bnlearn.com/bnrepository/hailfinder/hailfinder.rda")
load(con)
close(con)

### Convert the bnlearn data to a graph object
this.adj <- amat(bn)
this.graph <- as(this.adj, "graphNEL")

### Generate random data according to this DAG using the method rmvDAG
pp <- numNodes(this.graph)
nn <- 100
dat <- rmvDAG(n = nn, dag = this.graph, errDist = "normal")
dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering

### Run the algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)

References

[1] B. Aragam and Q. Zhou. Concave penalized estimation of sparse Gaussian Bayesian networks. The Journal of Machine Learning Research, In press, 2015.

itsrainingdata/ccdr