wittawatj/kernel-gof

Gamma Distribution when k=1

michelleowen opened this issue · 9 comments

Hi, I tried to test gamma distribution with your test.
I used a gradient log to specify p as below:

def gamma_grad_log(X):
    return ((k-1)/X - 1/theta)*(X>0)

When k=2 and theta=2, and build a sample data from same gamma distribution as below

k_t = 2
theta_t = 2
n=500
seed = 4
np.random.seed(seed)
X = np.random.gamma(k_t, theta_t,(n,1))

In this case, the test works fine - not reject H_0.
{'alpha': 0.01,
'h0_rejected': False,
'n_simulate': 3000,
'pvalue': 0.31133333333333335,
'test_stat': 0.0003191977834490073,
'time_secs': 0.001219034194946289}

But if k=1, the test will reject H_0.
{'alpha': 0.01,
'h0_rejected': True,
'n_simulate': 3000,
'pvalue': 0.0,
'test_stat': 100.18839042139058,
'time_secs': 0.0012459754943847656}

I believe this is an expected behavior since the current version only supports distributions with full support. For Gamma, the support is the positive real. One has to modify the Stein operator to handle this case. An explanation is given at the bottom of this notebook https://github.com/wittawatj/kernel-gof/blob/master/ipynb/demo_kgof.ipynb .

p.s. I am not sure if *(X>0) is correct. Technically (X>0) should be multiplied to the density, not log density. But log 0 = -inf.

Please feel free to let me know if you have other questions.

@wittawatj, thanks for prompt response. I tried to implement the density too but didn't work well either. Is there a way to implement gamma distribution correctly? Or do you plan to extend the test to a more broad classes of distribution for non-full-support cases? It will be super helpful as a lot of dataset actually has no full support in real world.

I agree that not all distributions have full support. At the moment, this is not a problem of the code implementation, but a problem of the formulation of the Stein operator used. The Kernel Stein Discrepancy (https://arxiv.org/abs/1602.03253, https://arxiv.org/abs/1602.02964) suffers the same problem because the same Stein operator is used. I am developing a simple way to extend both FSSD and KSD to handle non-full support. Stay tuned!

By the way, I guess that the Gamma distribution you implemented there is only to test the code? If your model is really Gamma, then there are other tests designed specifically to handle that.

Hi @wittawatj, I have read your latest paper "Kernel Stein Tests for Multiple Model Comparison". So even with this extension to model comparisons, if one wants to use KSD as underlying metrics, the underlying data on which different models fit should have full support, right? Also, I checked the github repo provided in this paper, it seems the reproducing code is not there yet. Do you have an ETA for that?

That is correct. The extension is in the sense that it extends existing tests for comparing 2 models to more than 2 models. Underlying the new test is still the original KSD. So it inherits this issue. Still requires full support.

Hi @michelleowen, we are hoping to get the code for "Kernel Stein Tests for Multiple Model Comparison" code out by January.

Hi @michelleowen, the code for reproducing the paper can now be found in the repository.

@jenninglim thank you for letting me know.