abess-team/abess

Gamma model's result go wrong

bbayukari opened this issue · 4 comments

Describe the bug

The result of Gamma model is wrong. When use approx newton, the estimator is always all-zero vector; a total wrong result is got when use exact newton.

Code for Reproduction

Here are R code:

  n <- 10000
  p <- 5
  support.size <- 3
  dataset <- generate.data(n, p, support.size, family = "gamma", seed = 1)
  
  approx_fit <- abess(
    dataset[["x"]],
    dataset[["y"]],
    family = "gamma",
    newton = "approx",
  )
  exact_fit <- abess(
    dataset[["x"]],
    dataset[["y"]],
    family = "gamma",
    newton = "exact",
  )
  print("true_coef: ")
  print(dataset$beta)
  print("approx newton est_coef: ")
  print(approx_fit$beta[,support.size]) 
  print("exact newton est_coef: ")
  print(exact_fit$beta[,support.size]) 

Result:

[1] "true_coef: "
[1] 0.000000 0.000000 3.069073 6.725235 7.974553
[1] "approx newton est_coef: "
x1 x2 x3 x4 x5 
 0  0  0  0  0 
[1] "exact newton est_coef: "
          x1           x2           x3           x4           x5 
2.159370e+01 0.000000e+00 5.188777e-13 0.000000e+00 0.000000e+00 

Here are Python code:

import abess
import numpy as np

np.random.seed(1)
data = abess.make_glm_data(n=10000, p=5, k=3, family="gamma")

model1 = abess.GammaRegression(support_size = 3, approximate_Newton = False)
model1.fit(data.x, data.y)

model2 = abess.GammaRegression(support_size = 3, approximate_Newton = True)
model2.fit(data.x, data.y)

print("true_coef: ",data.coef_)
print("approx newton est_coef: ",model2.coef_)
print("exact newton est_coef: ",model1.coef_)

Results:

true_coef:  [ 1.47594114  6.66687502 -2.85407881  0.          0.        ]
approx newton est_coef:  [0. 0. 0. 0. 0.]
exact newton est_coef:  [ 0.00000000e+00  0.00000000e+00 -2.10497802e-34 -1.93607800e-35 1.82703568e-34]

Desktop (please complete the following information):

  • OS: Platform Version: Linux-4.15.0-189-generic-x86_64-with-glibc2.27, 64bit
  • Python Version: 3.10.4
  • Package Version: 0.4.6

is the gamma regression is covered in the python test? @oooo26

The gamma's pytest covers invalid results only, but does not compare with true coefficients.

It is quite a mistake... I'll fix that soon.

As we have discussed, Gamma regression is much harder. You may first consider a large-sample setting. If it works well in this setting, I think it might be probably OK. @oooo26

Sure, and I think I figure out the bug: I forget to modify start point for Gamma's fitting.

(In the general GLM fitting process, all coefficients start with 0, which makes Gamma unable to update.)

Will update both the algorithm and test file soon.