Gamma model's result go wrong
bbayukari opened this issue · 4 comments
Describe the bug
The result of Gamma model is wrong. When use approx newton, the estimator is always all-zero vector; a total wrong result is got when use exact newton.
Code for Reproduction
Here are R code:
n <- 10000
p <- 5
support.size <- 3
dataset <- generate.data(n, p, support.size, family = "gamma", seed = 1)
approx_fit <- abess(
dataset[["x"]],
dataset[["y"]],
family = "gamma",
newton = "approx",
)
exact_fit <- abess(
dataset[["x"]],
dataset[["y"]],
family = "gamma",
newton = "exact",
)
print("true_coef: ")
print(dataset$beta)
print("approx newton est_coef: ")
print(approx_fit$beta[,support.size])
print("exact newton est_coef: ")
print(exact_fit$beta[,support.size])
Result:
[1] "true_coef: "
[1] 0.000000 0.000000 3.069073 6.725235 7.974553
[1] "approx newton est_coef: "
x1 x2 x3 x4 x5
0 0 0 0 0
[1] "exact newton est_coef: "
x1 x2 x3 x4 x5
2.159370e+01 0.000000e+00 5.188777e-13 0.000000e+00 0.000000e+00
Here are Python code:
import abess
import numpy as np
np.random.seed(1)
data = abess.make_glm_data(n=10000, p=5, k=3, family="gamma")
model1 = abess.GammaRegression(support_size = 3, approximate_Newton = False)
model1.fit(data.x, data.y)
model2 = abess.GammaRegression(support_size = 3, approximate_Newton = True)
model2.fit(data.x, data.y)
print("true_coef: ",data.coef_)
print("approx newton est_coef: ",model2.coef_)
print("exact newton est_coef: ",model1.coef_)
Results:
true_coef: [ 1.47594114 6.66687502 -2.85407881 0. 0. ]
approx newton est_coef: [0. 0. 0. 0. 0.]
exact newton est_coef: [ 0.00000000e+00 0.00000000e+00 -2.10497802e-34 -1.93607800e-35 1.82703568e-34]
Desktop (please complete the following information):
- OS: Platform Version: Linux-4.15.0-189-generic-x86_64-with-glibc2.27, 64bit
- Python Version: 3.10.4
- Package Version: 0.4.6
The gamma's pytest covers invalid results only, but does not compare with true coefficients.
It is quite a mistake... I'll fix that soon.
As we have discussed, Gamma regression is much harder. You may first consider a large-sample setting. If it works well in this setting, I think it might be probably OK. @oooo26
Sure, and I think I figure out the bug: I forget to modify start point for Gamma's fitting.
(In the general GLM fitting process, all coefficients start with 0, which makes Gamma unable to update.)
Will update both the algorithm and test file soon.