Gamma model's result go wrong

Question

Gamma model's result go wrong

bbayukari opened this issue 2 years ago · 4 comments

Describe the bug

The result of Gamma model is wrong. When use approx newton, the estimator is always all-zero vector; a total wrong result is got when use exact newton.

Code for Reproduction

Here are R code:

  n <- 10000
  p <- 5
  support.size <- 3
  dataset <- generate.data(n, p, support.size, family = "gamma", seed = 1)
  
  approx_fit <- abess(
    dataset[["x"]],
    dataset[["y"]],
    family = "gamma",
    newton = "approx",
  )
  exact_fit <- abess(
    dataset[["x"]],
    dataset[["y"]],
    family = "gamma",
    newton = "exact",
  )
  print("true_coef: ")
  print(dataset$beta)
  print("approx newton est_coef: ")
  print(approx_fit$beta[,support.size]) 
  print("exact newton est_coef: ")
  print(exact_fit$beta[,support.size])

Result:

[1] "true_coef: "
[1] 0.000000 0.000000 3.069073 6.725235 7.974553
[1] "approx newton est_coef: "
x1 x2 x3 x4 x5 
 0  0  0  0  0 
[1] "exact newton est_coef: "
          x1           x2           x3           x4           x5 
2.159370e+01 0.000000e+00 5.188777e-13 0.000000e+00 0.000000e+00

Here are Python code:

import abess
import numpy as np

np.random.seed(1)
data = abess.make_glm_data(n=10000, p=5, k=3, family="gamma")

model1 = abess.GammaRegression(support_size = 3, approximate_Newton = False)
model1.fit(data.x, data.y)

model2 = abess.GammaRegression(support_size = 3, approximate_Newton = True)
model2.fit(data.x, data.y)

print("true_coef: ",data.coef_)
print("approx newton est_coef: ",model2.coef_)
print("exact newton est_coef: ",model1.coef_)

Results:

true_coef:  [ 1.47594114  6.66687502 -2.85407881  0.          0.        ]
approx newton est_coef:  [0. 0. 0. 0. 0.]
exact newton est_coef:  [ 0.00000000e+00  0.00000000e+00 -2.10497802e-34 -1.93607800e-35 1.82703568e-34]

Desktop (please complete the following information):

OS: Platform Version: Linux-4.15.0-189-generic-x86_64-with-glibc2.27, 64bit
Python Version: 3.10.4
Package Version: 0.4.6

Answer 1 · 2023-02-10T02:34:07.000Z

is the gamma regression is covered in the python test? @oooo26

Answer 2 · 2023-02-10T04:16:02.000Z

The gamma's pytest covers invalid results only, but does not compare with true coefficients.

It is quite a mistake... I'll fix that soon.

Answer 3 · 2023-02-10T07:22:17.000Z

As we have discussed, Gamma regression is much harder. You may first consider a large-sample setting. If it works well in this setting, I think it might be probably OK. @oooo26

Answer 4 · 2023-02-10T09:26:28.000Z

Sure, and I think I figure out the bug: I forget to modify start point for Gamma's fitting.

(In the general GLM fitting process, all coefficients start with 0, which makes Gamma unable to update.)

Will update both the algorithm and test file soon.