/pycocoevalcap

Python 3 support for the MS COCO caption evaluation tools [Fix for ClipScore]

Primary LanguagePythonOtherNOASSERTION

Microsoft COCO Caption Evaluation

Evaluation codes for MS COCO caption generation.

Description

This repository provides Python 3 support for the caption evaluation metrics used for the MS COCO dataset.

The code is derived from the original repository that supports Python 2.7: https://github.com/tylin/coco-caption.
Caption evaluation depends on the COCO API that natively supports Python 3.

Requirements

  • Java 1.8.0
  • Python 3
  • For CLIPScore, both pytorch and OpenAI's CLIP are required.

Installation

To install pycocoevalcap and the pycocotools dependency (https://github.com/cocodataset/cocoapi), run:

pip install git+https://github.com/jmhessel/pycocoevalcap

Usage

See the example script: example/coco_eval_example.py

Files

./

  • eval.py: The file includes COCOEavlCap class that can be used to evaluate results on COCO.
  • tokenizer: Python wrapper of Stanford CoreNLP PTBTokenizer
  • bleu: Bleu evalutation codes
  • meteor: Meteor evaluation codes
  • rouge: Rouge-L evaluation codes
  • cider: CIDEr evaluation codes
  • spice: SPICE evaluation codes
  • clipscore: CLIPScore evaluation codes

Setup

  • SPICE requires the download of Stanford CoreNLP 3.6.0 code and models. This will be done automatically the first time the SPICE evaluation is performed.
  • Note: SPICE will try to create a cache of parsed sentences in ./spice/cache/. This dramatically speeds up repeated evaluations. The cache directory can be moved by setting 'CACHE_DIR' in ./spice. In the same file, caching can be turned off by removing the '-cache' argument to 'spice_cmd'.

References

Developers

  • Xinlei Chen (CMU)
  • Hao Fang (University of Washington)
  • Tsung-Yi Lin (Cornell)
  • Ramakrishna Vedantam (Virgina Tech)

Acknowledgement

  • David Chiang (University of Norte Dame)
  • Michael Denkowski (CMU)
  • Alexander Rush (Harvard University)
  • Jungo Kasai (UW): for helping to squash a bug with the CLIPScore implementation