/classo

Fit the Complex Lasso and Complex Graphical Lasso

Primary LanguageRGNU General Public License v3.0GPL-3.0

---
title: "The `classo` Package"
author: "Michael Weylandt"
date: "March 16, 2020"
output:
  md_document:
    variant: gfm
---

<!--  -*- coding: utf-8 -*- -->
<!-- README.md is generated from README.Rmd. Do not edit this file directly -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  echo=TRUE,
  collapse=TRUE,
  comment="#>", 
  fig.path="man/figures/"
)
```

[![GitHub Actions Build Status](https://github.com/michaelweylandt/classo/workflows/R-CMD-check and Deploy/badge.svg)](https://github.com/michaelweylandt/classo/actions?query=workflow%3A%22R-CMD-check+and+Deploy%22)
[![Travis-CI Build Status](https://api.travis-ci.com/michaelweylandt/classo.svg?branch=develop)](https://travis-ci.com/michaelweylandt/classo)
[![codecov](https://codecov.io/gh/michaelweylandt/classo/branch/develop/graph/badge.svg)](https://codecov.io/gh/michaelweylandt/classo)
[![License: GPL v2](https://img.shields.io/badge/License-GPL%20v2-blue.svg)](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/classo)](https://cran.r-project.org/package=classo)
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)

The `classo` package provides 

## The Complex Lasso

This package implements the `classo` penalty of Zhou, Jin, and Hoi (2010)
and Campbell and Allen (2017) for generalized linear models.

\[\text{arg min}_{\beta} \frac{1}{2n} \|y - X\beta\|_2^2 + \lambda \sum_{g \in \mathcal{G}} \frac{\|\beta_g\|_1^2}{2}\]

This penalty is the "converse" of the group lasso, encouraging selection of a single
variable in each group. See Campbell and Allen (2017) for a thorough discussion of
this estimator and its properties.

The package provides efficient inexact proximal gradient and coordinate descent
schemes to solve exclusive lasso problems. The interface is similar to that of
the popular [`glmnet`](https://cran.r-project.org/web/packages/glmnet/index.html),
[`ncvreg`](http://pbreheny.github.io/ncvreg/), and [`grpreg`](http://pbreheny.github.io/ncvreg/)
packages.

## Installation

The current working version of the package can be installed from Github:

```{r, eval=FALSE}
library(devtools)
install_github("michaelweylandt/classo")
```

## Usage

We begin by simulating a small data set with simple structure:

```{r}
library(classo)
n <- 200
p <- 500

beta <- complex(p);
beta[1:10] <- 3

X <- matrix(rcnorm(n * p), ncol=p)
y <- X %*% beta + rcnorm(n)
```

We fit the exclusive lasso to this data set, using a user-specified group structure:

```{r, fig.width = 4, fig.height = 4}
exfit <- exclusive_lasso(X, y, groups)
print(exfit)
plot(exfit)
```

As we can see, for this very simple problem, the exclusive lasso picked out the 
true variables (though the standard lasso would have done as well here).

The `cv.exclusive_lasso` function can be used to select the tuning parameter $\lambda$,
though as Campbell and Allen (2017) note, standard cross-validation does not perform
particularly well for this problem, and model selection according to BIC / EBIC
with a group-thresholding step yields superior results. To facilitate model selection
by BIC / EBIC, an unbiased estimate of the degrees of freedom is calculated.

In addition to standard linear regression, the `classo` package also implements
logistic and Poisson regression. See the package vignette for details.

## Authors

* [Michael Weylandt](http://github.com/michaelweylandt)

    Department of Statistics, Rice University

## Acknowledgements

* MW was supported by the NSF Graduate Research Fellowship Program under grant
  number 1842494.