ccdr
implements the CCDr structure learning algorithm described in [1]. Based on observational data, this algorithm estimates the structure of a Bayesian network (aka edges in a DAG) using penalized maximum likelihood based on L1 or concave (MCP) regularization.
Important note: This legacy package consists of a single method that implements the main algorithm and is not actively maintained. The main purpose of this repo is to serve as a reproducible snapshot for the simulations in [1].
Please note that this package is currently in a development state. In order to install the package, you will need both devtools
and Rcpp
. We are still working out the kinks of getting the package to compile on different systems, so please let us know if you have any issues.
If you have never installed packages using Rcpp
before, we recommend checking out the following resources which should get you started:
- Rcpp for Seamless R and C++ Integration
- Rcpp: Seamless R and C++ Integration, Dirk Eddelbuettel, Romain Francois, Journal of Statistical Software, Vol. 40, Issue 8, Apr 2011.
- High performance functions with Rcpp
The code is hosted on github and can be installed using devtools
:
if (packageVersion("devtools") < 1.6) {
install.packages("devtools")
}
if(devtools::find_rtools()) devtools::install_github("itsrainingdata/ccdr")
NOTE: Windows users will need to make sure that Rtools is installed in order to build this package. To check if you have Rtools installed, you can use devtools::find_rtools()
. For more details: http://cran.r-project.org/bin/windows/Rtools/.
If you find any bugs, please report them here on github.
The basic usage is as follows:
### Specify dimensions
nn <- 20
pp <- 100
### Generate a random Gaussian matrix
dat <- matrix(rnorm(nn * pp), nrow = nn)
### Run the ccdr algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20)
### Display the results
print(ccdr.path)
The output of ccdr.run
is an S3 object ccdrPath
, which is essentially a list of estimates, one for each value of lambda in the solution path. Each estimate is an S3 object ccdrFit
. The DAG itself is stored as an edge list (see documentation for ccdrFit-class
and edgeList-class
for more details).
This trivial example uses uncorrelated normal data, which is not very interesting. In order to do some interesting calculations, we first need to generate data according to some pre-specified DAG structure. The ccdr
package does not provide this functionality: To generate data from a given Bayesian network and/or simulate random networks, the following R packages are recommended:
bnlearn
: bnlearn on CRAN, www.bnlearn.compcalg
: pcalg on CRANigraph
: igraph on CRAN, http://igraph.org/r/
The pcalg
package provides two useful functions: randomDAG
for generating a random ordered DAG and rmvDAG
for generating random data according to the structural equation model implied by the DAG.
NOTE: Since randomDAG
produces an ordered DAG, we should permute the columns in the simulated dataset in order to obfuscate this information. If we know this ordering in advance, there are much better methods for estimating the DAG structure using autoregressive models.
library("pcalg")
### Set up the model parameters
nn <- 20 # How many samples to draw?
pp <- 100 # How many nodes in the DAG?
num.edges <- 100 # How many *expected* edges in the DAG?
ss <- num.edges / pp # This is the expected number of parents *per node*
### Generate a random DAG using the pcalg method randomDAG
beta.min <- 0.5
beta.max <- 2
edge.pr <- 2 * ss / (pp - 1)
g <- randomDAG(n = pp, prob = edge.pr, lB = beta.min, uB = beta.max) # Note that the edge weights are selected at random here!
### Generate random data according to this DAG using the method rmvDAG
dat <- rmvDAG(n = nn, dag = g, errDist = "normal")
dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering
### Run the algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)
The bnlearn
package provides methods for reading in data from the Bayesian Network Repository. We can use these methods along with the pcalg
function rmvDAG
to generate random data from these structures:
library("bnlearn")
library("graph")
library("pcalg")
### Download the RDA file containing bnlearn-compatible data from the repository
con <- url("http://www.bnlearn.com/bnrepository/hailfinder/hailfinder.rda")
load(con)
close(con)
### Convert the bnlearn data to a graph object
this.adj <- amat(bn)
this.graph <- as(this.adj, "graphNEL")
### Generate random data according to this DAG using the method rmvDAG
pp <- numNodes(this.graph)
nn <- 100
dat <- rmvDAG(n = nn, dag = this.graph, errDist = "normal")
dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering
### Run the algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)
[1] B. Aragam and Q. Zhou. Concave penalized estimation of sparse Gaussian Bayesian networks. The Journal of Machine Learning Research, In press, 2015.