Add an option to use Huggingface metrics
yuvalkirstain opened this issue ยท 7 comments
๐ Feature
Support Huggingface metrics.
Motivation
Torchmetrics are great, but there are many metrics that are not publicly available. Luckily, Huggingface implemented lots of them. Can you please an easy way to add metrics from Huggingface?
Pitch
Specifying a metric from Huggingface, making sure I give it the correct arguments without needing to implement it on my own.
@yuvalkirstain I think this is a great idea! But after adding a metric to torchmetrics (link), I'll be the first to say that this might be a little tricky. There's a lot of refactoring going on. @Borda @SeanNaren Do yall know how torchmetrics
and datasets.load_metric
are supposed to different? Or are they rival libraries?
There's a hook here to configure metrics, and a hook here that may be generalized to all models. If I'm assigned to this, I should be able to tackle it.
How did we want to swap metrics? I'm imagining that in the yaml, it looks like:
metrics:
accuracy:
path: torchmetrics.Accuracy
recall:
path: torchmetrics.Recall
And we can swap it by switching it to:
metrics:
accuracy:
path: datasets.metrics.Accuracy
recall:
path: datasets.metrics.Recall
If there's a custom metric, we can format it like:
metrics:
MyCustomMetric:
path: abs/path/to/metric.py
where it tries to import MyCustomMetric from abs/path/to/metric.py
Also, with a few metrics, the targets/preds order are switched or are in the process of being switched. So it might be good to generalize the handling of outputs (assuming they are a callable dictionary), like this:
# loading outputs into metric
metrics_output = metric(**model_output)
All comments on this are welcome!
@mathemusician with only looking at the huggingface metrics API, I think it should be relatively easy to give an adapter for that like this:
class HFMetricAdapter(torchmetrics.Metric):
def __init__(self, hf_metric):
super().__init__()
self.hf_metric = hf_metric
def update(self, *args, **kwargs):
self.hf_metric.add_batch(*args, **kwargs)
def compute(self):
return self.hf_metric.compute()
That being said, I don't know much about huggingface metrics and therefore I am not sure, whether all the magic torchmetrics
does in the background to properly sync and accumulate the metrics state across different GPUs (which would be required for correctness when training on multiple GPUs) will work. For that to work, huggingface
would have to take care of that.
@yuvalkirstain @mathemusician
What Metrics specifically would you like to see? Maybe we can add more of them to torchmetrics
for a more native experience?
@justusschock Good point. It might actually be easier to migrate as many metrics from HuggingFace, then use our hooks, rather writing workarounds for using someone else's hooks. @yuvalkirstain Any thoughts?
@mathemusician @justusschock
Huggingface metrics is very popular and might have metrics that can not be found in TorchMetrics, or implementations that slightly differ from those found in TorchMetrics. Additionally, people will probably add new metrics to Huggingface metrics. Therefore, I really like the HFMetricAdapter
workaround. I find this to be an orthogonal effort to adding more metrics to TorchMetrics.
@yuvalkirstain Understood, as stated, with this adaptor we, however, cannot guarantee that everything works fine regarding synchronization.
And if we start implementing new torchmetrics Metrics now, I think there is a high chance that new metrics can be implemented to it quickly as well. I see a huge benefit there, because this is not tied to one ecosystem.
@justusschock Yes, I understand. Thank you so much for the help.