NCML for Non-convex Machine Learning Problems

NCML is a numerical package for solving large-scale non-convex optimization arising from machine learning and statistics. The solver provides some popular algorithms under the variable metric scheme which involves the quasi-Newton method. It is written in Python and performs well on many non-convex machine learning problems. NCML is developed by Yilin Wang, SHUFE. Contact: ylwang228@hotmail.com.

Highlights

follows the scikit-learn API conventions
supports both dense and sparse data representations
implements quasi-Newton module in Cython
applies multiple quasi-Newton update criterion (SR1, BFGS)

Solvers supported

Barzilai-Borwein method based algorithm (GIST, BBMPG, BBMPG_DCA)
quasi-Newton based algorithm(GDVMPG)

DC Programming

In NCML, we mainly consider a family of non-convex possibly non-smooth optimization problems in the following form:

$min_{x\in \chi} F(x):=f(x)+g(x)-h(x)$

where is the optimization parameters, $\chi$ is a closed and convex set in $\mathbb{R}^d$ , $f,h:\mathbb{R}^d\rightarrow \mathbb{R}$ are real-valued convex functions, $g:\mathbb{R}^d\rightarrow \mathbb{R}$ is a proper lower-semicontinuous function. For an extended-real-value function $F:\mathbb{R}^d\rightarrow \mathbb{R}\cup \{+\infty\}$ , the component function is a non-smooth regularizer that promotes sparsity, e.g., the convex norm or the non-convex norm with $p \in (0,1)$ .

Example

We consider the following regularized risk minimization problem:

$min_x \Psi(x):=\frac{1}{n}\sum^n_{i=1}L_i(x)+\alpha r(x)$

where denotes the loss function, e.g., logistic loss for classification problems and quadratic loss for regression problems. And is a non-convex regularizer including MCP, SCAD, etc.

Usage

We show how to solve a binary classification problem with MCP penalty on the news20.binary dataset.

from ncml.datasets.loaders import load_dataset
from ncml.impl.bbmpg_dca import BBMPG_DCAClassifier

# load dataset
def load_data(name):
    ret = load_dataset(name)
    X_tr_clf, y_tr_clf, X_te_clf, y_te_clf = ret
    dataset_without_test = ('madelon', 'real-sim', 'news20.binary')
    if name in dataset_without_test:
        X_tr_clf, X_te_clf, y_tr_clf, y_te_clf = train_test_split(X_tr_clf,
                        y_tr_clf, test_size=0.33, random_state=40)
    return X_tr_clf, y_tr_clf, X_te_clf, y_te_clf
X_tr, y_tr, X_te, y_te = load_data('news20.binary')

# Set classifier options
bbmpgdca = BBMPG_DCAClassifier(loss='logistic',
                               penalty='mcp',
                               scale_choice='diagonal_bb',
                               linesearch_choice='nonmonotonic',
                               momentum_flag=False,
                               tol=bbmpgdca_tol)
# Train the model
bbmpgdca.fit(X_tr, y_tr)

# Accuracy
bbmpgdca_tr_acu = bbmpgdca.score(X_tr, y_tr)
bbmpgdca_te_acu = bbmpgdca.score(X_te, y_te)
print('tr_acu', bbmpgdca_tr_acu)
print('te_acu', bbmpgdca_te_acu)