Sentence/Caption evaluation using automated metrics.
This code is released as supplementary material with S2VT[1].
This code can be used to
- evaluate sentences/captions for any dataset,
- it provides BLEU, METEOR, ROUGE-L and CIDEr scores.
This uses the MSCOCO caption evaluation code [2].
- Get this code.
git clone https://github.com/vsubhashini/caption-eval.git
- Get the coco evaluation scripts.
./get_coco_scripts.sh
To ensure you have all the dependencies for the evaluation scripts, please refer to the COCO Caption Evaluation page.
Make sure you have the coco scripts
./get_coco_scripts.sh
Create your groundtruth references in the desired format
Here's a sample file with several reference sentences: data/references.txt
python create_json_references.py -i data/references.txt -o data/references.json
Evaluate the model predictions against the references
Sample file with predictions from a model is in data/predicted_sentences.txt
python run_evaluations.py -i data/predicted_sentences.txt -r data/references.json
- Sequence to Sequence - Video to Text
- Microsoft COCO Captions: Data Collection and Evaluation Server
- PTBTokenizer: Stanford Tokenizer which is included in Stanford CoreNLP 3.4.1.
- BLEU: BLEU: a Method for Automatic Evaluation of Machine Translation
- Meteor: Project page with related publications. COCO server uses version (1.5) of the Code.
- Rouge-L: ROUGE: A Package for Automatic Evaluation of Summaries
- CIDEr: [CIDEr: Consensus-based Image Description Evaluation] (http://arxiv.org/pdf/1411.5726.pdf)
[1] Sequence to Sequence - Video to Text
Sequence to Sequence - Video to Text
S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko
The IEEE International Conference on Computer Vision (ICCV) 2015
[2] Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
X. Chen, H. Fang, T.Y. Lin, R. Vedantam, S. Gupta, P. Dollar, C.L. Zitnick
arXiv preprint arXiv:1504.00325