Running calibration metrics in python
Opened this issue · 3 comments
Hi David,
I was trying to run a range of metrics from pycalibration, yet I seem to be unable to get them to work due to julia syntax being so different.
So I start by generating synthetic predicted probabilities (
These will become the (perfect accuracy) inputs to the calibration metrics.
I take a set of metrics (from julia) that I would like to apply on these pythonic numpy arrays:
ece = ECE(UniformBinning(10), (μ, y) -> kl_divergence(y, μ))
ece_bin = ece(p, y)
from julia.api import Julia
jl = Julia(compiled_modules=False)
import pycalibration
from pycalibration import calerrors as ce
p = [[0.6461553 0.1392794 0.21456529]
[0.36878229 0.33442643 0.29679128], ...]
y = [0 0 , ...]
For skce
and ece = ce.ECE(ce.UniformBinning(10), ce.SqEuclidean())
I can get it to work if I correctly format and type the arrays:
import numpy as np
p2 = [np.array(a).astype(np.float32) for a in p]
skce = ce.UnbiasedSKCE(ce.tensor(ce.ExponentialKernel(), ce.WhiteKernel()))
skce(p2,y.astype(np.int))
> 0.19480350613594055
However, some syntax I don't know how to translate in python, like the swapped order kl_divergence:
ece = ce.ECE(ce.UniformBinning(10), ce.kl_divergence)
ece_bin_l2 = ece(p, y)
> np.nan #gives nan all the time
Any advice here?
Thanks!
Jordy
To follow up on this, I wrote a unit test in python for some of the metrics, where I cannot get some to work (commented out):
from julia.api import Julia
jl = Julia(compiled_modules=False)
import pycalibration
from pycalibration import calerrors as ce
def sample_multiclass_data(N=100):
import numpy as np
np.random.seed(42)
n_c1 = n_c2 = n_c3 = N
p = np.concatenate((np.random.dirichlet([6, 2, 3], n_c1),
np.random.dirichlet([5, 12, 5], n_c2),
np.random.dirichlet([2, 3, 5], n_c3)
))
y = np.concatenate((np.zeros(n_c1), np.ones(n_c2), np.ones(n_c3)*2))
return p, y
def build_rbf_kernel(p_hat):
distances = ce.pairwise(ce.SqEuclidean(), p_hat)
rows = len(distances) #N
columns = np.array(p_hat).shape[-1] #K
indices = []
for i in range(rows):
for j in range(columns):
indices.append((i,j))
λ = ce.sqrt(ce.median(distances[i] for i in indices if i[0] < i[1])) #median heuristic
kernel = ce.tensor(ce.compose(ce.GaussianKernel(), ce.ScaleTransform(ce.inv(λ))), ce.WhiteKernel())
return kernel
def kernel_calibration_error(p_hat, y_true, biased=False, blocks=False):
kernel = build_rbf_kernel(p_hat)
if biased:
skce = ce.BiasedSKCE(kernel)# if not blocks else ce.BlockBiasedSKCE(kernel, 2)
else:
skce = ce.UnbiasedSKCE(kernel) if not blocks else ce.BlockUnbiasedSKCE(kernel, 2)
return skce(p_hat, y_true)
def calibration_metrics(p_hat, y_true):
p_hat = [np.array(a).astype(np.float32) for a in p_hat]
y_true = np.array(y_true).astype(np.int)
d = {}
# ECE
#does not work
# ece = ce.ECE(ce.UniformBinning(10), ce.kl_divergence())
# d["ece_bin"] = ece(p_hat, y_true)
ece = ce.ECE(ce.UniformBinning(10), ce.SqEuclidean())
d["ece_bin-l2"] = ece(p_hat, y_true)
#does not work
# ece = ce.ECE(ce.MedianVarianceBinning(5), ce.kl_divergence())
# d["ece_medvar"] = ece(p_hat, y_true)
#does not work
# ece = ce.ECE(ce.MedianVarianceBinning(5), ce.SqEuclidean())
# d["ece_medvar-l2"] = ece(p_hat, y_true)
# KCE
for biased in [True,False]:
b = "biased" if biased else "unbiased"
for blocks in [True, False]:
if biased and blocks:
pass
else:
bb = "block" if blocks else "block"
d[f"{bb}-{b}-skce"] = kernel_calibration_error(p_hat, y_true, biased=biased, blocks=blocks)
return d
p, y = sample_multiclass_data(N=100)
calibration_metrics(p, y)
Could you check what could be the problem with kl_divergence and medianvariancebinning? :)
Appreciate it!
Jordy
Hi @devmotion , you might have missed the previous questions, so reminding you, if you have some time to spare ;)
Hi @Jordy-VL,
I am very sorry for the long delay! I came back from vacation just recently and I still have to catch up with all messages and issues that piled up on Github 🙂
I'll have a closer look at your issue tomorrow (thanks for the detailed instructions for how to reproduce it!) but a major problem is probably that you used classes 0, 1, and 2. Julia uses 1-based indexing and hence also CalibrationErrors.jl expects classes 1, 2, and 3.
I am also a bit surprised that you had to convert the Python arrays to specific types, it was not necessary when I ran the examples in the README. I'll check this as well.
Another high-level comment is that you seem to build lists of predictions. It should be possible to use regular numpy arrays as well if you wrap them in ce.RowVecs
(or ce.ColVecs
depending on if the predictions are rows or columns of a matrix). There's an example in the README, the design of RowVecs
and ColVecs
is discussed more detailed in https://juliagaussianprocesses.github.io/KernelFunctions.jl/dev/design/#why_abstract_vectors
With more recent versions of KernelFunctions.jl it is also easier to construct kernels with a specific lengthscale: https://juliagaussianprocesses.github.io/KernelFunctions.jl/dev/userguide/