devmotion/pycalibration

Running calibration metrics in python

Opened this issue · 3 comments

Hi David,

I was trying to run a range of metrics from pycalibration, yet I seem to be unable to get them to work due to julia syntax being so different.

So I start by generating synthetic predicted probabilities ($p$) and ground truths ($y$) for a simple multi-class problem:
image
These will become the (perfect accuracy) inputs to the calibration metrics.

I take a set of metrics (from julia) that I would like to apply on these pythonic numpy arrays:

ece = ECE(UniformBinning(10), (μ, y) -> kl_divergence(y, μ))
ece_bin = ece(p, y)
from julia.api import Julia
jl = Julia(compiled_modules=False)
import pycalibration
from pycalibration import calerrors as ce

p = [[0.6461553  0.1392794  0.21456529]
 [0.36878229 0.33442643 0.29679128], ...] 
y = [0 0 , ...] 

For skce and ece = ce.ECE(ce.UniformBinning(10), ce.SqEuclidean()) I can get it to work if I correctly format and type the arrays:

import numpy as np
p2 = [np.array(a).astype(np.float32) for a in p]
skce = ce.UnbiasedSKCE(ce.tensor(ce.ExponentialKernel(), ce.WhiteKernel()))
skce(p2,y.astype(np.int))
> 0.19480350613594055

However, some syntax I don't know how to translate in python, like the swapped order kl_divergence:

ece = ce.ECE(ce.UniformBinning(10), ce.kl_divergence)
ece_bin_l2 = ece(p, y)
> np.nan #gives nan all the time

Any advice here?

Thanks!

Jordy

To follow up on this, I wrote a unit test in python for some of the metrics, where I cannot get some to work (commented out):

from julia.api import Julia
jl = Julia(compiled_modules=False)
import pycalibration
from pycalibration import calerrors as ce
def sample_multiclass_data(N=100):
    import numpy as np
    np.random.seed(42)

    n_c1 = n_c2 = n_c3 = N
    p = np.concatenate((np.random.dirichlet([6, 2, 3], n_c1),
                        np.random.dirichlet([5, 12, 5], n_c2),
                        np.random.dirichlet([2, 3, 5], n_c3)
                       ))

    y = np.concatenate((np.zeros(n_c1), np.ones(n_c2), np.ones(n_c3)*2))
    return p, y

def build_rbf_kernel(p_hat):
    distances = ce.pairwise(ce.SqEuclidean(), p_hat)

    rows = len(distances) #N
    columns = np.array(p_hat).shape[-1] #K

    indices = []
    for i in range(rows):
        for j in range(columns):
            indices.append((i,j))

    λ = ce.sqrt(ce.median(distances[i] for i in indices if i[0] < i[1])) #median heuristic
    kernel = ce.tensor(ce.compose(ce.GaussianKernel(), ce.ScaleTransform(ce.inv(λ))), ce.WhiteKernel()) 
    return kernel
    
def kernel_calibration_error(p_hat, y_true, biased=False, blocks=False):
    kernel = build_rbf_kernel(p_hat)
    if biased:
        skce = ce.BiasedSKCE(kernel)# if not blocks else ce.BlockBiasedSKCE(kernel, 2)
    else:
        skce = ce.UnbiasedSKCE(kernel) if not blocks else ce.BlockUnbiasedSKCE(kernel, 2)
    return skce(p_hat, y_true)

def calibration_metrics(p_hat, y_true):
    p_hat = [np.array(a).astype(np.float32) for a in p_hat]
    y_true = np.array(y_true).astype(np.int)

    d = {}
    
    # ECE
#does not work
#     ece = ce.ECE(ce.UniformBinning(10), ce.kl_divergence())
#     d["ece_bin"] = ece(p_hat, y_true)

    ece = ce.ECE(ce.UniformBinning(10), ce.SqEuclidean())
    d["ece_bin-l2"] = ece(p_hat, y_true)

#does not work
#     ece = ce.ECE(ce.MedianVarianceBinning(5), ce.kl_divergence())
#     d["ece_medvar"] = ece(p_hat, y_true)

#does not work
#     ece = ce.ECE(ce.MedianVarianceBinning(5), ce.SqEuclidean())
#     d["ece_medvar-l2"] = ece(p_hat, y_true)

    # KCE
    for biased in [True,False]:
        b = "biased" if biased else "unbiased"
        for blocks in [True, False]:
            if biased and blocks:
                pass
            else:
                bb = "block" if blocks else "block"
                d[f"{bb}-{b}-skce"] = kernel_calibration_error(p_hat, y_true, biased=biased, blocks=blocks)
    return d

p, y = sample_multiclass_data(N=100) 
calibration_metrics(p, y)

Could you check what could be the problem with kl_divergence and medianvariancebinning? :)

Appreciate it!

Jordy

Hi @devmotion , you might have missed the previous questions, so reminding you, if you have some time to spare ;)

Hi @Jordy-VL,
I am very sorry for the long delay! I came back from vacation just recently and I still have to catch up with all messages and issues that piled up on Github 🙂

I'll have a closer look at your issue tomorrow (thanks for the detailed instructions for how to reproduce it!) but a major problem is probably that you used classes 0, 1, and 2. Julia uses 1-based indexing and hence also CalibrationErrors.jl expects classes 1, 2, and 3.

I am also a bit surprised that you had to convert the Python arrays to specific types, it was not necessary when I ran the examples in the README. I'll check this as well.

Another high-level comment is that you seem to build lists of predictions. It should be possible to use regular numpy arrays as well if you wrap them in ce.RowVecs (or ce.ColVecs depending on if the predictions are rows or columns of a matrix). There's an example in the README, the design of RowVecs and ColVecs is discussed more detailed in https://juliagaussianprocesses.github.io/KernelFunctions.jl/dev/design/#why_abstract_vectors

With more recent versions of KernelFunctions.jl it is also easier to construct kernels with a specific lengthscale: https://juliagaussianprocesses.github.io/KernelFunctions.jl/dev/userguide/