naver-ai/eccv-caption

trying to compute 'coco_1k_recalls' and 'pmrp', but got error....

zl535320706 opened this issue · 7 comments

When I try to test 'coco_1k_recalls' through "metric.compute_all_metrics()" , I got the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/metrics.py", line 297, in compute_all_metrics
    retrieved_items, 'coco_1k', Ks)
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/metrics.py", line 238, in __update_recalls
    _scores = recall_fn(retrieved_items, 'all', K=K)
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/metrics.py", line 169, in coco_1k_recalls
    t2i_gt_1k_recalls.append(compute_coco1k_r_at_k(retrieved_items['t2i'], nfold_coco_t2i, nfold_iids, K, self.verbose))
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/_metrics.py", line 151, in compute_coco1k_r_at_k
    _recall = 1 if _target_items[0] in _gt_items else 0
IndexError: list index out of range

When I try to test 'pmrp' through "metric.compute_all_metrics()", I got the following error: (I have revised eccv_caption.Metrics(
extra_file_dir=xxx)
according the tutorials in your repository)

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/metrics.py", line 304, in compute_all_metrics
    _scores = self.pmrp(retrieved_items, 'all')
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/metrics.py", line 205, in pmrp
    modality)
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/metrics.py", line 138, in __compute_metric
    **kwargs)
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/_metrics.py", line 87, in compute_rprecision
    _prec = rprecision(_target_items, all_matched, n_matched)
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.7/site-packages/eccv_caption/_metrics.py", line 39, in rprecision
    precision = 1 - len(non_precise) / len(target_R_items)
ZeroDivisionError: division by zero

And the other metrics works well, such as :'eccv_r1', 'eccv_map_at_r', 'eccv_rprecision', 'coco_5k_recalls', 'cxc_recalls'.

In addition, I guess there maybe some problem when calculating ‘t2i’ (r5, r10) of 'coco_5k_recalls', because it is different from what I calculated using existing open source code, but the results of ‘i2t’ (r1, r5, r10) of 'coco_5k_recalls' are the same. (there maybe also problems with ‘t2i’ (r1, r5, r10) of 'cxc', but I haven't calculated cxc before, so I'm not entirely sure)

I have processed the image and text features output from my model, as well as the iid and cid, according to the tutorials in your repository. The resulting i2t length is 5000, and the t2i length is 25000. Is there any problem here?

@zl535320706
Quick solution: do not filter top-k for your retrieved items, but use the full items as follows:
Current example (from the current example code based on CLIP):

    K = 50
    for idx, iid in enumerate(all_iids):
        values, indices = sims[idx, :].topk(K)
        indices = indices.detach().cpu().numpy()
        i2t[iid] = [int(cid) for cid in all_cids[indices]]

    for idx, cid in enumerate(all_cids):
        values, indices = sims[:, idx].topk(K)
        indices = indices.detach().cpu().numpy()
        t2i[cid] = [int(iid) for iid in all_iids[indices]]

    scores = metric.compute_all_metrics(
        i2t, t2i,
        target_metrics=('eccv_r1', 'eccv_map_at_r', 'eccv_rprecision',
                        'coco_1k_recalls', 'coco_5k_recalls', 'cxc_recalls'),
        Ks=(1, 5, 10),
        verbose=False
    )
    print(scores)

Safe but slow / large memory solution:

    for idx, iid in enumerate(all_iids):
        values, indices = sims[idx, :].sort(descending=True)
        indices = indices.detach().cpu().numpy()
        i2t[iid] = [int(cid) for cid in all_cids[indices]]

    for idx, cid in enumerate(all_cids):
        values, indices = sims[:, idx].sort(descending=True)
        indices = indices.detach().cpu().numpy()
        t2i[cid] = [int(iid) for iid in all_iids[indices]]

    scores = metric.compute_all_metrics(
        i2t, t2i,
        target_metrics=('eccv_r1', 'eccv_map_at_r', 'eccv_rprecision',
                        'coco_1k_recalls', 'coco_5k_recalls', 'cxc_recalls'),
        Ks=(1, 5, 10),
        verbose=False
    )
    print(scores)

See the full explanation for the details of this solution below:


Sorry for the late reply. I can presume the first one, but not 100% sure about the second one.

My implementation is focused on efficiency, i.e., avoiding computing the same similarity scores multiple times.
Hence, my implementation takes the dictionary of {query_id : [list of retrieved_items]}, where the list of retrieved items are top-K items instead of the whole ids for the retrieved items (e.g., for image-to-text retrieval it will be 25000 number of integers). While the other scores do not suffer from this issue, but COCO 1K Recalls (5-fold) can suffer from "no items from the retrieved items", when computing the 5-fold score.

Let's assume a simple case:
my query id = A, retrieved_items = [1, 2, 3, 4, 5, 21, 22, 23, 24, 25] (topK = 10, where the number of total items = 100)
Now, let's assume that the 5-fold split is (1, 2, ..., 20) (21, 22, .. , 40), ... (81, 82, ... 100).
When computing the first-fold and second-fold recall@1, it will be okay.
However, when computing 3rd, 4th and 5th folds, it will return empty _target_items (it is a filtered version of retrieved_items by n-th fold ids).
The easiest way to fix this error is to increase the top-k for the input of compute_all_metrics.

The second one seems to be a similar issue, but I am not 100% as of now. I presume that it will be also resolved by the previous solution (increasing top-K)
As the comment in my original example code, you may need larger K than 50.

    # If you want to use the original PMRP, then K should be larger than 13380
    # (max PM t2i/i2t positives = 2676/13380)

I will take a look into it, but it could take some time. I will inform you later when I find the issue.

@SanghyukChun
Thank you very much for your patient answer. According to what you said, I relaxed the restriction on K, and sure enough, the first problem was resolved, but the second 'pmrp' problem still exists, and the same error was reported. 'coco_1k_recalls' can run well, and the current problem is focused on some scores anomaly, as shown in the following table:

image

Can you attach your i2t_retrieved_items and t2i_retrieved_items without filtering (using google drive or dropbox?)? It is impossible to presume what the problem is with only the table.

Can we share you variable sims ?

Ah sure. If possible, please share the code for eccv_caption evaluation as well

OK, due to network, uploading may take some time. I will try to have the relevant files and code ready by tomorrow.


@SanghyukChun
Relevant files and code have shared via OneDrive
Thanks again for you kind help.

@zl535320706
Oh, I didn't get any notification for your code and files.
I will take a look into it within a few weeks.