/classo

A package implements Classifier-Lasso

Primary LanguageR

Classifier-Lasso

This package implements in R the Classifier-Lasso method by

Su, L., Shi, Z., & Phillips, P. C. (2016): "Identifying latent structures in panel data", Econometrica, 84(6), 2215-2264.

This package is under active development...

Code of the classifier-Lasso method was originally developed in MATLAB using CVX as the modeling language and MOSEK as the convex solver. Here is replicable empirical examples in the paper.

The package uses an open source solver ECOS via CVXR by default. We skipped the Disciplined Convex Programming (DCP) check steps to speed up the optimization.

To further speed up the computation, an R version using Rmosek to directly invoke MOSEK is elaborated in "Implementing Convex Optimization in R: Two Econometric Examples" with demonstration code. In our experiments, this R+Rmosek implementation often solves the optimization problem with at most 1/3 of the time by the MATLAB+CVX+MOSEK implementation and at most 2/3 of the time by CVXR+ECOS implementation without DCP check.

Installation

The current beta version can be installed from Github by:

library(devtools)
devtools::install_github("zhan-gao/classo", INSTALL_opts=c("--no-multiarch"))
library(classo)

Though not required for installation and use, Rmosek is highly recommended. According to our extensive experience, using Rmosek is often much faster than R with other solvers.

An installation gist of MOSEK can be found at here. The installation of the latest version MOSEK 9.0 includes Rmosek. It can be invoked in R following this instruction (Tested with success).

Alternatively, Rmosek can be downloaded from CRAN. We have tested with success on R 3.6.3 the following lines:

install.packages("Rmosek")
library(Rmosek)
mosek_attachbuilder("path_to_the_bin_folder_of_MOSEK")
install.rmosek()

Please make sure Rmosek is successfully installed and activated before use PLS.mosek() function to do estimation.

Examples

The sample data is generated by DGP 1 described in Su, Shi and Phillips (2016) with N = 200 and T = 25.

data("sample_data")
# CAVEAT: Please convert data.frame to matrix to proceed.
y <- as.matrix(sample_data[, 1])
x <- as.matrix(sample_data[, -1])
n <- 200
tt <- 25
lambda <- as.numeric( 0.5 * var(y) / (tt^(1/3)) )
pls_out <- PLS.cvxr(n, tt, y, x, K = 3, lambda = lambda)

# Use Rmosek if it is successfully installed
# pls_out <- PLS.mosek(n, tt, y, x, K = 3, lambda = lambda)

# estimated slope for each group. True coefficients: [1,1; 0.4,1.6; 1.6,0.4]
pls_out$a.out 
          [,1]      [,2]
[1,] 1.0387521 0.9986867
[2,] 0.4017041 1.6014119
[3,] 1.6197497 0.3614408
# Estimated group structure
# True group structure:
# 	group 2: 1 - 60
# 	group 1: 61 - 120
# 	group 3: 121 - 200
pls_out$group.est
  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [33] 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1
 [65] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1
 [97] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3
[129] 3 3 3 3 3 1 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[161] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[193] 3 3 3 3 3 3 3 3