[Documentation (stable version)] [Documentation (development version)]
- Pyglmnet provides a wide range of noise models (and paired canonical
link functions):
'gaussian'
,'binomial'
,'probit'
,'gamma'
, 'poisson
', and'softplus'
. - It supports a wide range of regularizers: ridge, lasso, elastic net, group lasso, and Tikhonov regularization.
- We have implemented a cyclical coordinate descent optimizer with Newton update, active sets, update caching, and warm restarts. This optimization approach is identical to the one used in R package.
- A number of Python wrappers exist for the R glmnet package (e.g. here and here) but in contrast to these, Pyglmnet is a pure python implementation. Therefore, it is easy to modify and introduce additional noise models and regularizers in the future.
Install the stable PyPI version with pip
$ pip install pyglmnet
For the bleeding edge development version:
Clone the repository.
$ pip install https://api.github.com/repos/glm-tools/pyglmnet/zipball/master
Here is an example on how to use the GLM
estimator.
import numpy as np
import scipy.sparse as sps
import matplotlib.pyplot as plt
from pyglmnet import GLM, simulate_glm
n_samples, n_features = 1000, 100
distr = 'poisson'
# sample a sparse model
np.random.seed(42)
beta0 = np.random.rand()
beta = sps.random(1, n_features, density=0.2).toarray()[0]
# simulate data
Xtrain = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytrain = simulate_glm('poisson', beta0, beta, Xtrain)
Xtest = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytest = simulate_glm('poisson', beta0, beta, Xtest)
# create an instance of the GLM class
glm = GLM(distr='poisson', score_metric='pseudo_R2', reg_lambda=0.01)
# fit the model on the training data
glm.fit(Xtrain, ytrain)
# predict using fitted model on the test data
yhat = glm.predict(Xtest)
# score the model on test data
pseudo_R2 = glm.score(Xtest, ytest)
print('Pseudo R^2 is %.3f' % pseudo_R2)
# plot the true coefficients and the estimated ones
plt.stem(beta, markerfmt='r.', label='True coefficients')
plt.stem(glm.beta_, markerfmt='b.', label='Estimated coefficients')
plt.ylabel(r'$\beta$')
plt.legend(loc='upper right')
# plot the true vs predicted label
plt.figure()
plt.plot(ytest, yhat, '.')
plt.xlabel('True labels')
plt.ylabel('Predicted labels')
plt.plot([0, ytest.max()], [0, ytest.max()], 'r--')
plt.show()
More pyglmnet examples and use cases.
Here is an extensive tutorial on GLMs, optimization and pseudo-code.
Here are slides from a talk at PyData Chicago 2016, corresponding tutorial notebooks and a video.
We welcome pull requests. Please see our developer documentation page for more details.
- Konrad Kording for funding and support
- Sara Solla for masterful GLM lectures
MIT License Copyright (c) 2016-2019 Pavan Ramkumar