Ciela-Institute/tarp

Unable to reproduce figure 2

Closed this issue · 1 comments

Hi,
as I am trying to port this method to pytorch, I stumbled upon a curious error. I was using the gaussian toy model from the paper section 4.1. I first had a look at the unit tests of this package here. They appear to work great, but I found it curious that the absolute difference of ecp and alpha was set to a high value of (0.25).

So I added two tests which try to reproduce the overconfident and underconfident cases.

def get_test_data(scalefactor=1.0):
    num_samples = 100
    num_sims = 100
    num_dims = 5
    theta = np.random.uniform(low=-5, high=5, size=(num_sims, num_dims))
    log_sigma = np.random.uniform(low=-5, high=-1, size=(num_sims, num_dims))
    sigma = scalefactor * np.exp(log_sigma)
    samples = np.random.normal(
        loc=theta, scale=sigma, size=(num_samples, num_sims, num_dims)
    )
    theta = np.random.normal(loc=theta, scale=sigma, size=(num_sims, num_dims))
    return samples, theta

# ...

class ConfidenceTest(unittest.TestCase):
    def test_underdispersed(self):
        samples, theta = get_test_data(0.05)
        ecp, alpha = get_tarp_coverage(
            samples,
            theta,
            references="random",
            metric="euclidean",
            norm=False,
            bootstrap=False,
        )
        print("test_underdispersed", np.max(np.abs(ecp - alpha)))
        self.assertAlmostEqual(np.max(np.abs(ecp - alpha)), 0.0, delta=0.25)

    def test_overdispersed(self):
        samples, theta = get_test_data(20.0)
        ecp, alpha = get_tarp_coverage(
            samples,
            theta,
            references="random",
            metric="euclidean",
            norm=False,
            bootstrap=False,
        )
        print("test_overdispersed", np.max(np.abs(ecp - alpha)))
        self.assertAlmostEqual(np.max(np.abs(ecp - alpha)), 0.0, delta=0.25)

To my surprise, the numerical value of np.max(np.abs(ecp - alpha)) didn't change much. So I went and gave the notebook a spin. I observe that the notebook uses a different way to generate the Gaussian toy data. I was able to reproduce the correct case.
image
Then, I changed the data generating function slightly and produced overconfident and underconfident data:

def generate_correlated_samples(num_samples, num_sims, num_dims, covfactor=1.):
    """ Generate samples and true parameter values """
    theta = np.random.uniform(low=-5, high=5, size=(num_sims, num_dims))
    cov = [generate_psd_matrix(num_dims) for _ in range(num_sims)]
    cov = covfactor*np.concatenate(cov).reshape(num_sims, num_dims, num_dims)
    samples = [np.random.multivariate_normal(mean=theta[i], cov=cov[i], size=num_samples) for i in range(num_sims)]
    samples = np.stack(samples)
    samples = samples.transpose(1, 0, 2)
    theta = [np.random.multivariate_normal(mean=theta[i], cov=cov[i], size=1) for i in range(num_sims)]
    theta = np.stack(theta)[:,0]
    return samples, theta

I was again surprised to see that the tarp coverage is unable to differentiate correct case from underdispersed or overdispersed.

image
image

Could you please double check where the problem is? I added the notebook which produced the plots above for convenience as a gist here.

Stupid me, I put the covfactor in the wrong spot. This dispersed both theta and samples - which is incorrect. It works like a charm now, once I only disperse samples as it should be.