Discrepancy in optimal threshold calculation between sklearn and torchmetrics ROC implementations
vitalwarley opened this issue · 2 comments
Bug description
There's a noticeable difference in the calculated optimal thresholds when comparing the ROC curve implementations between sklearn.metrics.roc_curve
and torchmetrics.functional.roc
. Specifically, using the same input data for similarity scores and labels, sklearn
produces a significantly lower optimal threshold value compared to torchmetrics
.
What version are you seeing the problem on?
v2.2
How to reproduce the bug
import numpy as np
import torch
from sklearn.metrics import roc_curve
import torchmetrics.functional as tm
# Given values
similarities = torch.tensor([0.0938, 0.0041, -0.1011, 0.0182, 0.0932, -0.0269, -0.0266, -0.0298,
-0.0200, 0.0816, -0.0122, -0.0026, 0.1237, -0.0149, 0.0840, -0.0192,
-0.0488, 0.0114, -0.0076, -0.0583])
is_kin_labels = torch.tensor([1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0])
# Ensure data is on CPU for sklearn compatibility
similarities_ = similarities.cpu().numpy()
is_kin_labels_ = is_kin_labels.cpu().numpy()
# Sklearn calculation
fpr_, tpr_, thresholds_ = roc_curve(is_kin_labels_, similarities_)
maxindex_ = (tpr_ - fpr_).argmax()
best_threshold_sklearn = thresholds_[maxindex_]
# Torchmetrics calculation (assuming similarities and is_kin_labels are already on CPU or CUDA compatible)
fpr, tpr, thresholds = tm.roc(similarities, is_kin_labels, task='binary')
maxindex = (tpr - fpr).argmax()
best_threshold_torchmetrics = thresholds[maxindex].item()
# Output comparison
print(f"Best threshold sklearn: {best_threshold_sklearn:.6f} @ {maxindex_} index of {len(thresholds_)} (fpr={fpr_[maxindex_]:.6f}, tpr={tpr_[maxindex_]:.6f})")
print(f"Best threshold torchmetrics: {best_threshold_torchmetrics:.6f} @ {maxindex} index of {len(thresholds)} (fpr={fpr[maxindex]:.6f}, tpr={tpr[maxindex]:.6f})")
# Best threshold sklearn: 0.093200 @ 2 index of 10 (fpr=0.000000, tpr=0.428571)
# Best threshold torchmetrics: 0.523283 @ 3 index of 21 (fpr=0.000000, tpr=0.428571)
Error messages and logs
No response
Environment
Current environment
- CUDA:
- GPU:
- NVIDIA GeForce RTX 3070 Laptop GPU
- available: True
- version: 12.1
- GPU:
- Lightning:
- lightning: 2.2.1
- lightning-utilities: 0.10.1
- pytorch-lightning: 2.2.1
- torch: 2.2.1
- torchmetrics: 1.3.1
- torchvision: 0.17.1
- Packages:
- absl-py: 2.1.0
- aiohttp: 3.9.3
- aiosignal: 1.3.1
- asttokens: 2.4.1
- attrs: 23.2.0
- beautifulsoup4: 4.12.3
- certifi: 2024.2.2
- cfgv: 3.4.0
- chardet: 5.2.0
- charset-normalizer: 3.3.2
- click: 8.1.7
- contourpy: 1.2.0
- cycler: 0.12.1
- daemonize: 2.5.0
- debugpy: 1.8.1
- decorator: 5.1.1
- distlib: 0.3.8
- docstring-parser: 0.16
- executing: 2.0.1
- filelock: 3.13.1
- fonttools: 4.50.0
- frozenlist: 1.4.1
- fsspec: 2023.12.2
- gdown: 5.1.0
- grpcio: 1.62.1
- guildai: 0.9.0
- identify: 2.5.35
- idna: 3.6
- importlib-resources: 6.3.2
- ipython: 8.20.0
- jedi: 0.19.1
- jinja2: 3.1.3
- joblib: 1.3.2
- jsonargparse: 4.27.6
- kiwisolver: 1.4.5
- lightning: 2.2.1
- lightning-utilities: 0.10.1
- markdown: 3.6
- markupsafe: 2.1.3
- matplotlib: 3.8.3
- matplotlib-inline: 0.1.6
- mpmath: 1.3.0
- multidict: 6.0.5
- natsort: 8.4.0
- networkx: 3.2.1
- nodeenv: 1.8.0
- numpy: 1.26.4
- nvidia-cublas-cu12: 12.1.3.1
- nvidia-cuda-cupti-cu12: 12.1.105
- nvidia-cuda-nvrtc-cu12: 12.1.105
- nvidia-cuda-runtime-cu12: 12.1.105
- nvidia-cudnn-cu12: 8.9.2.26
- nvidia-cufft-cu12: 11.0.2.54
- nvidia-curand-cu12: 10.3.2.106
- nvidia-cusolver-cu12: 11.4.5.107
- nvidia-cusparse-cu12: 12.1.0.106
- nvidia-nccl-cu12: 2.19.3
- nvidia-nvjitlink-cu12: 12.3.101
- nvidia-nvtx-cu12: 12.1.105
- opencv-python: 4.9.0.80
- packaging: 24.0
- parso: 0.8.3
- pexpect: 4.9.0
- pillow: 10.2.0
- pip: 24.0
- pkginfo: 1.10.0
- platformdirs: 4.2.0
- pre-commit: 3.6.2
- prompt-toolkit: 3.0.43
- protobuf: 4.25.3
- psutil: 5.9.8
- ptyprocess: 0.7.0
- pure-eval: 0.2.2
- pygments: 2.17.2
- pyparsing: 3.1.2
- pysocks: 1.7.1
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.2.1
- pyyaml: 6.0.1
- requests: 2.31.0
- scikit-learn: 1.4.1.post1
- scipy: 1.12.0
- setuptools: 69.0.3
- six: 1.16.0
- soupsieve: 2.5
- stack-data: 0.6.3
- sympy: 1.12
- tabview: 1.4.4
- tensorboard: 2.16.2
- tensorboard-data-server: 0.7.2
- threadpoolctl: 3.3.0
- torch: 2.2.1
- torchmetrics: 1.3.1
- torchvision: 0.17.1
- tqdm: 4.66.2
- traitlets: 5.14.1
- triton: 2.2.0
- typeshed-client: 2.5.1
- typing-extensions: 4.9.0
- urllib3: 2.2.1
- virtualenv: 20.25.1
- wcwidth: 0.2.13
- werkzeug: 3.0.1
- wheel: 0.42.0
- yarl: 1.9.4
- System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor:
- python: 3.11.8
- release: 6.7.9-arch1-1
- version: Lightning-AI/pytorch-lightning#1 SMP PREEMPT_DYNAMIC Fri, 08 Mar 2024 01:59:01 +0000
More info
The output from thresholds_
(using sklearn
) and thresholds
(using torchmetrics
) reveals a significant difference in the threshold values range and granularity:
[ins] In [6]: thresholds_
Out[6]:
array([ inf, 0.1237, 0.0932, 0.0114, -0.0026, -0.0149, -0.0192,
-0.02 , -0.0266, -0.1011], dtype=float32)
[ins] In [7]: thresholds
Out[7]:
tensor([1.0000, 0.5309, 0.5234, 0.5233, 0.5210, 0.5204, 0.5045, 0.5028, 0.5010,
0.4993, 0.4981, 0.4970, 0.4963, 0.4952, 0.4950, 0.4934, 0.4933, 0.4926,
0.4878, 0.4854, 0.4747])
Hi! thanks for your contribution!, great first issue!
I think I found the problem. The returned thresholds are probabilities, because
preds (float tensor): (N, ...). Preds should be a tensor containing probabilities or logits for each observation. If preds has values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element.
So it makes sense. My fault... However, I didn't find it very clear at first.
thresholds: an 1d tensor of size (n_thresholds, ) with decreasing threshold values