pytorch/ignite

Add cosine similarity metric

kzkadc opened this issue ยท 4 comments

๐Ÿš€ Feature

It would be nice to add cosine similarity as a Metric.
Cosine similarity between feature vectors is often used in representation learning.

@kzkadc thanks for the suggestion! Can you please detail a bit computational part:

  • which inputs: dtype, typical shapes of the metric
  • what would be the overall formula, e.g. average of all cosine similarities over all predictions vs ground truths

If you would like to provide a draft PR on this feature, you are very welcome!

By the way, today, this metric implementation is already possible to get using built-in features:

import torch
from ignite.engine import Engine, Events
from ignite.metrics import Average

batch_size = 4
num_features = 10

def eval_step(engine, _):
    y_pred = torch.rand(batch_size, num_features)
    y_true = torch.rand(batch_size, num_features)
    return y_pred, y_true

evaluator = Engine(eval_step)

avg_cosine_similarity = Average(output_transform=lambda output: torch.cosine_similarity(output[0], output[1])).mean()
avg_cosine_similarity.attach(evaluator, "avg_cosine_similarity")

fake_eval_data = range(10)
state = evaluator.run(fake_eval_data)
print(state.metrics)

Output:

{'avg_cosine_similarity': 0.7671696498990059}

@vfdev-5
Thank you for your suggestion! That's exactly what I intended to do.
But adding cosine similarity metric would be still nice because implementing it with Average and output_transform seems a bit technical.

Here are the details:

  • Inputs: two float tensors in the shape of [batch size, num_features].
  • Overall formula: the average of all cosine similarities, i.e., $(1/N)\sum_{i} \mathbf{z}_i^1\cdot \mathbf{z}_i^2 / (\| \mathbf{z}_i^1 \| \| \mathbf{z}_i^2 \|) $, when two batches $[\mathbf{z}_1^1\ldots, \mathbf{z}_N^1] \in \mathbb{R}^{N\times D}$ and $[\mathbf{z}_1^2\ldots, \mathbf{z}_N^2] \in \mathbb{R}^{N\times D}$ are given.

With the cosine similarity metric, the above code would be:

import torch
from ignite.engine import Engine, Events
from ignite.metrics import CosineSimilarity

batch_size = 4
num_features = 10

def eval_step(engine, _):
    y_pred = torch.rand(batch_size, num_features)
    y_true = torch.rand(batch_size, num_features)
    return y_pred, y_true

evaluator = Engine(eval_step)

CosineSimilarity().attach(evaluator, "avg_cosine_similarity")

fake_eval_data = range(10)
state = evaluator.run(fake_eval_data)
print(state.metrics)

Thank you.

Hey ๐Ÿ‘‹, I've just created a thread for this issue on PyTorch-Ignite Discord where you can quickly talk to the community on the topic.

๐Ÿค– This comment was automatically posted by Discuss on Discord

Yes, sounds good to add CosineSimilarity class and implement an average cosine similarity.
If you would like to help with implementing it, it would be great!

In the code, I assume we can store internally a sum of cosine similarities per batch, accumulate it on update method and just divide by the number of seen samples in compute method