/sam

Sparse Additive Modelling

Primary LanguageR

SAM

R Package for Sparse Additive Modelling

The package SAM targets at high dimensional predictive modeling (regression and classification) for complex data analysis. SAM is short for sparse additive modeling, and adopts the computationally efficient basis spline technique. We solve the optimization problems by various computational algorithms including the block coordinate descent algorithm, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by warm-start and active-set tricks.

Installation

Prerequisites

SAM uses OpenMP to enables faster matrix multiplication. So, to use SAM, you must correctly enables OpenMP for the compiler.

For Windows and Linux users, newest version of GCC has fully support of OpenMP.

But for MAC OS users, things are a little tricky since the default llvm on MAC OS does not support OpenMP. But the solution is easy. You can simply install llvm with full OpenMP support and direct R using this version of llvm.

First, install llvm with OpenMP support by typing

brew install llvm

Then append the following lines into ~/.R/Makevars to enable llvm with OpenMP support to be the compiler for R packages.

CC = /usr/local/bin/clang-omp
CXX = /usr/local/bin/clang-omp++
CXX98 = /usr/local/bin/clang-omp++
CXX11 = /usr/local/bin/clang-omp++
CXX14 = /usr/local/bin/clang-omp++
CXX17 = /usr/local/bin/clang-omp++
OBJC = /usr/local/bin/clang-omp
OBJCXX = /usr/local/bin/clang-omp++

Installing from GitHub

First, you need to install the devtools package. You can do this from CRAN. Invoke R and then type

install.packages(devtools)

Then load the devtools package and install SAM

library(devtools)
install_github("HMJiangGatech/sam")
library(SAM)

Windows User: If you encounter a Rtools version issue: 1. make sure you install the latest Rtools; 2. try the following code

assignInNamespace("version_info", c(devtools:::version_info, list("3.5" = list(version_min = "3.3.0", version_max = "99.99.99", path = "bin"))), "devtools")

Installing from CRAN

Ideally you can just install and enable SAM using with the help of CRAN on an R console.

install.packages("SAM")
library(SAM)

Usage

With SAM, you can run linear regression, logistic regression and poisson regression.

Example of linear regression

## generating training data
n = 100
d = 500
X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d)

## generating response
y = -2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1

## Training
out.trn = samQL(X,y)
out.trn

## plotting solution path
plot(out.trn)

## generating testing data
nt = 1000
Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d)

yt = -2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1

## predicting response
out.tst = predict(out.trn,Xt)

Example of logistic regression

## generating training data
n = 200
d = 100
X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d)
y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06)

## flipping about 5 percent of y
y = y*sign(runif(n)-0.05)
y = sign(y==1)

## Training
out.trn = samLL(X,y)
out.trn

## plotting solution path
plot(out.trn)

## generating testing data
nt = 1000
Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d)

yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06)

## flipping about 5 percent of y
yt = yt*sign(runif(nt)-0.05)
yt = sign(yt==1)

## predicting response
out.tst = predict(out.trn,Xt)

Example of Poisson Regression

## generating training data
n = 200
d = 100
X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d)
u = exp(-2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1+1)
y = rep(0,n)
for(i in 1:n) y[i] = rpois(1,u[i])

## Training
out.trn = samEL(X,y)
out.trn

## plotting solution path
plot(out.trn)

## generating testing data
nt = 1000
Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d)
ut = exp(-2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1+1)
yt = rep(0,nt)
for(i in 1:nt) yt[i] = rpois(1,ut[i])

## predicting response
out.tst = predict(out.trn,Xt)

To get complete documentation of SAM, please type ?SAM in an R terminal.

Experiments

The scripts used for experiments are in the folder tests/, to run the experiments, you should open the R terminal in the root of the project folder, and type

source("tests/test_linear.R")

or

source("tests/test_logis.R")

The machine we ran experiments on is a PC with

RAM: 31.3GB
Processor: Intel Core I7-6700T @ 2.80GHZ x8
Operating System: Ubuntu 16.04 LTS
R version: 3.2.3
GCC version: 5.4.0

We compared our results on linear regression and logistic regression with other packages, namely, grplasso, grpreg and gglasso. Also, for SAM and grpreg which support MCP regularization function, we run tests on them both with MCP regularizer and with L1 regularizer.

Linear Regression

SAM with L1 SAM with MCP previous SAM grplasso grpreg with L1 grpreg with MCP gglasso
time/s 13.458 13.115 14.523 42.346 34.081 35.460 22.191
loss 2.08e-5 1.66e-5 2.06e-5 2.26e-4 5.18e-4 2.12e-5 2.86e-3

Logistic Regression

SAM with L1 SAM with MCP previous SAM grplasso grpreg with L1 grpreg with MCP
time/s 7.758 9.384 96.361 18.700 92.922 28.333
accuracy 0.9054 0.9226 0.9109 0.9102 0.9052 0.8843

Reference

[1] Tuo Zhao, and Han Liu, Sparse Additive Machine, 2012.
[2] Pradeep Ravikumar, John Lafferty, Han Liu, Larry Wasserman, Pradeep, et al. Sparse additive models, 2009
[3] Xingguo Li, Jason Ge, Haoming Jiang, Mingyi Hong, Mengdi Wang, and Tuo Zhao, Boosting Pathwise Coordiante Optimization: Sequential Screening and Proximal Subsampled Newton Subroutine, 2016