Add cosine similarity metric
kzkadc opened this issue ยท 4 comments
๐ Feature
It would be nice to add cosine similarity as a Metric.
Cosine similarity between feature vectors is often used in representation learning.
@kzkadc thanks for the suggestion! Can you please detail a bit computational part:
- which inputs: dtype, typical shapes of the metric
- what would be the overall formula, e.g. average of all cosine similarities over all predictions vs ground truths
If you would like to provide a draft PR on this feature, you are very welcome!
By the way, today, this metric implementation is already possible to get using built-in features:
import torch
from ignite.engine import Engine, Events
from ignite.metrics import Average
batch_size = 4
num_features = 10
def eval_step(engine, _):
y_pred = torch.rand(batch_size, num_features)
y_true = torch.rand(batch_size, num_features)
return y_pred, y_true
evaluator = Engine(eval_step)
avg_cosine_similarity = Average(output_transform=lambda output: torch.cosine_similarity(output[0], output[1])).mean()
avg_cosine_similarity.attach(evaluator, "avg_cosine_similarity")
fake_eval_data = range(10)
state = evaluator.run(fake_eval_data)
print(state.metrics)
Output:
{'avg_cosine_similarity': 0.7671696498990059}
@vfdev-5
Thank you for your suggestion! That's exactly what I intended to do.
But adding cosine similarity metric would be still nice because implementing it with Average and output_transform seems a bit technical.
Here are the details:
- Inputs: two float tensors in the shape of [batch size, num_features].
- Overall formula: the average of all cosine similarities, i.e.,
$(1/N)\sum_{i} \mathbf{z}_i^1\cdot \mathbf{z}_i^2 / (\| \mathbf{z}_i^1 \| \| \mathbf{z}_i^2 \|) $ , when two batches$[\mathbf{z}_1^1\ldots, \mathbf{z}_N^1] \in \mathbb{R}^{N\times D}$ and$[\mathbf{z}_1^2\ldots, \mathbf{z}_N^2] \in \mathbb{R}^{N\times D}$ are given.
With the cosine similarity metric, the above code would be:
import torch
from ignite.engine import Engine, Events
from ignite.metrics import CosineSimilarity
batch_size = 4
num_features = 10
def eval_step(engine, _):
y_pred = torch.rand(batch_size, num_features)
y_true = torch.rand(batch_size, num_features)
return y_pred, y_true
evaluator = Engine(eval_step)
CosineSimilarity().attach(evaluator, "avg_cosine_similarity")
fake_eval_data = range(10)
state = evaluator.run(fake_eval_data)
print(state.metrics)
Thank you.
Hey ๐, I've just created a thread for this issue on PyTorch-Ignite Discord where you can quickly talk to the community on the topic.
๐ค This comment was automatically posted by Discuss on Discord
Yes, sounds good to add CosineSimilarity
class and implement an average cosine similarity.
If you would like to help with implementing it, it would be great!
In the code, I assume we can store internally a sum of cosine similarities per batch, accumulate it on update
method and just divide by the number of seen samples in compute
method