glm-tools/pyglmnet

how to compute psquared in GLM estimator

jpainam opened this issue · 7 comments

Hi, I'm comparing many estimator and i want to get the psquared for GLM estimator using Negative Binomiale, Gaussian and Poisson. I'm able to get the psquared for the other models, but not for GLM.
Please, kindly help.

hi sorry I don't follow your question. Can you try to explain a bit more your question and maybe illustrate with code? Thanks.

Thank you, i mean. How to compute pseudo R^2 for GLM estimators? how can I get the value of pseudo R^2 after running GLM estimator on my dataset

@jasmainak Hi, here is a sample code. How do I get the pseudo R^2

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

glm = GLM(distr='poisson')
n_samples, n_features = 10000, 100
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

Xtrain = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytrain = glm.simulate(beta0, beta, Xtrain)

Xtest = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytest = glm.simulate(beta0, beta, Xtest)
# fit the model on the training data
scaler = StandardScaler().fit(Xtrain)
glm.fit(scaler.transform(Xtrain), ytrain)

yhat = glm.predict(scaler.transform(Xtest))

deviance = glm.score(scaler.transform(Xtest), ytest)

you need to instantiate your GLM object with the scoring metric.

glm = GLM(distr='poisson', score_metric='pseudo_R2')

Thank you, this is the result when i used score_metric='pseudo_R2'

[0.38726042 0.62702618 0.74175937 0.79341947 0.82873337 0.83989782
              0.84622304 0.85146645 0.8528544  0.85402913]

I was expecting a single value. Which one is the pseudo R2 value?

I also get this error when i used my own set of data. Since this issue is resolved, i will close it and open a new issue with the problem i experienced with my own data
Thank you again

C:\Users\Paul\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\uti
ls\validation.py:475: DataConversionWarning: Data with input dtype object was conv
erted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
Traceback (most recent call last):                                      File "gpyg
lmnet.py", line 26, in <module>                               glm.fit(scaler.trans
form(Xtrain), ytrain)
  File "Cxxxx\lib\site-packages\pyglmnet\pyglmnet.py", line 634, in fit
    beta[0], beta[1:], rl, X, y)  File "xxxxxxxx\Python36\lib\site-packages\pyg
lmnet\pyglmnet.py", line 389, in _grad_L2loss                 X[selector, :]))
TypeError: ufunc 'add' output (typecode 'O') could not be coerced to provided outp
ut parameter (typecode 'd') according to the casting rule ''same_kind''

pseudo R2 value

are you running this on the regularization path? that's why you probably have 10 scores.