Lightning-Universe/lightning-transformers

Add an option to use Huggingface metrics

yuvalkirstain opened this issue ยท 7 comments

๐Ÿš€ Feature

Support Huggingface metrics.

Motivation

Torchmetrics are great, but there are many metrics that are not publicly available. Luckily, Huggingface implemented lots of them. Can you please an easy way to add metrics from Huggingface?

Pitch

Specifying a metric from Huggingface, making sure I give it the correct arguments without needing to implement it on my own.

@yuvalkirstain I think this is a great idea! But after adding a metric to torchmetrics (link), I'll be the first to say that this might be a little tricky. There's a lot of refactoring going on. @Borda @SeanNaren Do yall know how torchmetrics and datasets.load_metric are supposed to different? Or are they rival libraries?

There's a hook here to configure metrics, and a hook here that may be generalized to all models. If I'm assigned to this, I should be able to tackle it.

How did we want to swap metrics? I'm imagining that in the yaml, it looks like:

metrics:
  accuracy:
    path: torchmetrics.Accuracy
  recall:
    path: torchmetrics.Recall

And we can swap it by switching it to:

metrics:
  accuracy:
    path: datasets.metrics.Accuracy
  recall:
    path: datasets.metrics.Recall

If there's a custom metric, we can format it like:

metrics:
  MyCustomMetric:
    path: abs/path/to/metric.py

where it tries to import MyCustomMetric from abs/path/to/metric.py

Also, with a few metrics, the targets/preds order are switched or are in the process of being switched. So it might be good to generalize the handling of outputs (assuming they are a callable dictionary), like this:

# loading outputs into metric
metrics_output = metric(**model_output)

All comments on this are welcome!

@mathemusician with only looking at the huggingface metrics API, I think it should be relatively easy to give an adapter for that like this:

class HFMetricAdapter(torchmetrics.Metric):
    def __init__(self, hf_metric):
        super().__init__()
        self.hf_metric = hf_metric

    def update(self, *args, **kwargs):
        self.hf_metric.add_batch(*args, **kwargs)
    
    def compute(self):
        return self.hf_metric.compute()

That being said, I don't know much about huggingface metrics and therefore I am not sure, whether all the magic torchmetrics does in the background to properly sync and accumulate the metrics state across different GPUs (which would be required for correctness when training on multiple GPUs) will work. For that to work, huggingface would have to take care of that.

@yuvalkirstain @mathemusician
What Metrics specifically would you like to see? Maybe we can add more of them to torchmetrics for a more native experience?

@justusschock Good point. It might actually be easier to migrate as many metrics from HuggingFace, then use our hooks, rather writing workarounds for using someone else's hooks. @yuvalkirstain Any thoughts?

@mathemusician @justusschock
Huggingface metrics is very popular and might have metrics that can not be found in TorchMetrics, or implementations that slightly differ from those found in TorchMetrics. Additionally, people will probably add new metrics to Huggingface metrics. Therefore, I really like the HFMetricAdapter workaround. I find this to be an orthogonal effort to adding more metrics to TorchMetrics.

@yuvalkirstain Understood, as stated, with this adaptor we, however, cannot guarantee that everything works fine regarding synchronization.

And if we start implementing new torchmetrics Metrics now, I think there is a high chance that new metrics can be implemented to it quickly as well. I see a huge benefit there, because this is not tied to one ecosystem.

@justusschock Yes, I understand. Thank you so much for the help.