This Github repository contains the official evaluation script of the VIST Challenge at NAACL 2018.
You can download the interface web page in the following folder:
human_eval_interface/VIST_Demo_webpage.zip
The 200 sampled human stories in the VIST test set were used to show how the human rating were set up. Please see the www
folder in the .zip file.
- In VIST Challenge at NAACL 2018, the competition team will submit one (and only one) story Storytest for each photo sequence.
- For each photo sequence, N human-generated stories Storyhuman-1, Storyhuman-2, ..., Storyhuman-N have been collected as the gold-standard stories.
This evaluation script:
- For each photo sequence, it calculates the maximum Meteor score of all (Storytest, Storyhuman-n) pairs.
- It then calculates the average maximum Meteor score of all photo sequences.
You can download the entire runnable_jar
folder and run EvalVIST.jar
as it is within the folder.
runnable_jar/EvalVIST.jar
java -jar EvalVIST.jar -testFile <test_file_path> -gsFile <gs_file_path>
You can also use JVM parameter -Xmx
to set maximum Java heap size, e.g., java -Xmx2g -jar EvalVIST.jar -testFile <test_file_path> -gsFile <gs_file_path>
.
For running EvalVIST.jar
, as shown in runnable_jar
folder, please put:
data
folder (including theparaphrase-en.gz
in it. Can be found insrc/main/resources/meteor-1.5
folder)vist-challenge-template.json
(can be found insrc/main/resources
folder)
at the same folder of EvalVIST.jar
file.
The template file (vist-challenge-template.json
) is provided by the hosts of VIST Challenge for specifying which photo sequences are included in this challenge.
A few (~3%) photo sequences in the VIST test set are not included in the challenge because these photos have been removed from Flickr by their owners.
The following two parameters are both required:
Parameter | Description |
---|---|
testFile | The path of your submission file, which contains exactly one story for each photo sequence. Please see the following section for format details. |
gsFile | The path of gold-standard file, which contains the stories that were written by human workers. Please go to the VIST website to download the test set (~17MB, test.story-in-sequence.json ) of Images-in-Sequence (SIS) data. For the VIST challenge, we also collected 3 extra new stories for each photo sequence in the test set. This extra test set is not public. |
Please upload your submissions as a JSON file in the following format:
{
"team_name": "example_team_name",
"evaluation_info": {
"additional_description": "comments or notes about this submission."
},
"output_stories": [
{
"album_id": "flickr_album_id",
"photo_sequence": [
"flickr_photo_id_1",
"flickr_photo_id_2",
"flickr_photo_id_3",
"flickr_photo_id_4",
"flickr_photo_id_5"
],
"story_text_normalized": "normalized text of your story"
},
{
"album_id": "flickr_album_id",
"photo_sequence": [
"flickr_photo_id_1",
"flickr_photo_id_2",
"flickr_photo_id_3",
"flickr_photo_id_4",
"flickr_photo_id_5"
],
"story_text_normalized": "normalized text of your story"
},
{
"album_id": "flickr_album_id",
"photo_sequence": [
"flickr_photo_id_1",
"flickr_photo_id_2",
"flickr_photo_id_3",
"flickr_photo_id_4",
"flickr_photo_id_5"
],
"story_text_normalized": "normalized text of your story"
}...
]
}
Your submitted JSON file also needs to satisfy the following requirements, or it will be rejected by the system:
- For each story, please concatenate all the sentences together, with a space in between, to form a the story.
- Your JSON file should have only one single story for each photo sequence.
- Your JSON file should contain stories for all the photo sequences listed in the template file (
vist-challenge-template.json
). You can simply take the template file and fill in thestory_text_normalized
field of each photo sequence. - Any non-ASCII characters will be ignored for evaluation.
We also provide the following example submission files in the example_submission_json
folder.
You can use these files as the test file to try out the evaluation script.
File Name | Description |
---|---|
example-submission-file.test.json | Contains the first human-generated story of each photo sequence in the VIST test set. If using the VIST test set as gsFile, the resulting METEOR score output should be 1 (or 0.99999999...). |
example-submission-file-extra.empty.test.json | All stories are empty strings. The resulting METEOR score output should be 0. |
example-submission-file-extra.happy.test.json | Simple baseline. All stories are "everyone is happy ." repeating 5 times. |
example-submission-file-extra.missing.happy.test.json | This file missing one story. The script should report errors. |
example-submission-file-extra.wrong.happy.test.json | This file is of incorrect JSON format. The script should report errors. |
[Passed] Test file is in valid JSON syntax.
[Passed] Each photo sequence has only one story.
[Passed] All required stories are submitted.
MeteorConfiguration...
setTask...
scorer created...
--------------------------
Avg. Max Meteor Score =
0.9999520462921719
We used the Meteor 1.5 for this Java code. You need to include its JAR file when compiling the source code.
This repository is created and maintained by Ting-Hao (Kenneth) Huang (tinghaoh@cs.cmu.edu).