/semeval-2018-task4

SemEval 2018 Task 4: Character Identification on Multiparty Dialogues

Primary LanguagePythonOtherNOASSERTION

SemEval 2018 Task 4

Character Identification on Multiparty Dialogues

Character Identification is an entity linking task that identifies each mention as a certain character in multiparty dialogue. Let a mention be a nominal referring to a person (e.g., she, mom, Judy), and an entity be a character in a dialogue. The goal is to assign each mention to its entity, who may or may not participate in the dialogue. For the following example, the mention "mom" is not one of the speakers; nonetheless, it clearly refers to the specific person, Judy, that could appear in some other dialogue. Identifying such mentions as real characters requires cross-document entity resolution, which makes this task challenging.

Character Identification Example

Citation

References

Organizers

Datasets

The first two seasons of the TV show Friends are annotated for this task. Each season consists of episodes, each episode comprises scenes, and each scene is segmented into sentences. The followings describe the distributed datasets:

Note that the evaluation sets did not include the gold keys during the competition; we made them available after the competition. No dedicated development set was distributed for this task; feel free to make your own development set for training or perform cross-validation on the training sets.

Format

All datasets follow the CoNLL 2012 Shared Task data format. Documents are delimited by the comments in the following format:

#begin document (<Document ID>)[; part ###]
...
#end document

Each sentence is delimited by a new line ("\n") and each column indicates the following:

  1. Document ID: /<name of the show>-<season ID><episode ID> (e.g., /friends-s01e01).
  2. Scene ID: the ID of the scene within the episode.
  3. Token ID: the ID of the token within the sentence.
  4. Word form: the tokenized word.
  5. Part-of-speech tag: the part-of-speech tag of the word (auto generated).
  6. Constituency tag: the Penn Treebank style constituency tag (auto generated).
  7. Lemma: the lemma of the word (auto generated).
  8. Frameset ID: not provided (always _).
  9. Word sense: not provided (always _).
  10. Speaker: the speaker of this sentence.
  11. Named entity tag: the named entity tag of the word (auto generated).
  12. Entity ID: the entity ID of the mention, that is consistent across all documents.

Here is a sample from the training dataset:

/friends-s01e01  0  0  He     PRP   (TOP(S(NP*)    he     -  -  Monica_Geller   *  (284)
/friends-s01e01  0  1  's     VBZ          (VP*    be     -  -  Monica_Geller   *  -
/friends-s01e01  0  2  just   RB        (ADVP*)    just   -  -  Monica_Geller   *  -
/friends-s01e01  0  3  some   DT        (NP(NP*    some   -  -  Monica_Geller   *  -
/friends-s01e01  0  4  guy    NN             *)    guy    -  -  Monica_Geller   *  (284)
/friends-s01e01  0  5  I      PRP  (SBAR(S(NP*)    I      -  -  Monica_Geller   *  (248)
/friends-s01e01  0  6  work   VBP          (VP*    work   -  -  Monica_Geller   *  -
/friends-s01e01  0  7  with   IN     (PP*))))))    with   -  -  Monica_Geller   *  -
/friends-s01e01  0  8  !      .             *))    !      -  -  Monica_Geller   *  -
/friends-s01e01  0  0  C'mon  VB   (TOP(S(S(VP*))  c'mon  -  -  Joey_Tribbiani  *  -
/friends-s01e01  0  1  ,      ,                 *  ,      -  -  Joey_Tribbiani  *  -
/friends-s01e01  0  2  you    PRP           (NP*)  you    -  -  Joey_Tribbiani  *  (248)
/friends-s01e01  0  3  're    VBP            (VP*  be     -  -  Joey_Tribbiani  *  -
/friends-s01e01  0  4  going  VBG            (VP*  go     -  -  Joey_Tribbiani  *  -
/friends-s01e01  0  5  out    RP           (PRT*)  out    -  -  Joey_Tribbiani  *  -
/friends-s01e01  0  6  with   IN             (PP*  with   -  -  Joey_Tribbiani  *  -
/friends-s01e01  0  7  the    DT             (NP*  the    -  -  Joey_Tribbiani  *  -
/friends-s01e01  0  8  guy    NN            *))))  guy    -  -  Joey_Tribbiani  *  (284)
/friends-s01e01  0  9  !      .               *))  !      -  -  Joey_Tribbiani  *  -

A mention may include more than one word:

/friends-s01e02  0  0  Ugly         JJ   (TOP(S(NP(ADJP*  ugly         -  -  Chandler_Bing  *  (380
/friends-s01e02  0  1  Naked        JJ                *)  naked        -  -  Chandler_Bing  *  -
/friends-s01e02  0  2  Guy          NNP               *)  Guy          -  -  Chandler_Bing  *  380)
/friends-s01e02  0  3  got          VBD             (VP*  get          -  -  Chandler_Bing  *  -
/friends-s01e02  0  4  a            DT              (NP*  a            -  -  Chandler_Bing  *  -
/friends-s01e02  0  5  Thighmaster  NN               *))  thighmaster  -  -  Chandler_Bing  *  -
/friends-s01e02  0  6  !            .                *))  !            -  -  Chandler_Bing  *  -

The mapping between the entity ID and the actual character can be found in friends_entity_map.txt.

Evaluation

Your output must consist of the entity ID of each mention, one per line, in the sequential order. There are 6 mentions in the above example, which will generate the following output:

284
284
248
248
284
380

Given this output, the evaluation script will measure,

  1. The label accuracy considering only 7 entities, that are the 6 main characters (Chandler, Joey, Monica, Phoebe, Rachel, and Ross) and all the others as one entity.
  2. The macro average between the F1 scores of the 7 entities.
  3. The label accuracy considering all entities, where characters not appearing in the tranining data are grouped as one entity, others.
  4. The macro average between the F1 scores of all entities.
  5. The F1 scores for 7 entities.
  6. The F1 scores for all entities.

The following shows the command to run the evaluate.py:

python evaluate.py ref_out sys_out
  • ref_out: the reference output including the gold keys (download ref.out).
  • sys_out: the path to a file containing your system output; this should include 2,429 lines of keys, where each line indicates the entity ID of the corresponding mention.

Results

This task was hosted at CodaLab from 08/21/2017 to 01/29/2018: https://competitions.codalab.org/competitions/17310.

All Entities + Others

This evaluation considers all characters appearing in training, development, and evaluation sets as individual classes. Characters that appear only one or two of these sets are grouped as one class called OTHERS.

User ID Label Accuracy Average F1
AMORE UPF 74.72 41.05
Cheoneum 69.49 16.98
Kampfpudding 59.45 37.37
Zuma 25.81 14.42

Main Entities + Others

This evaluation considers 6 main characters as individual classes and all the other characters as one class called OTHERS.

User ID Label Accuracy Average F1
Cheoneum 85.10 86.00
AMORE UPF 77.23 79.36
Kampfpudding 73.36 73.51
Zuma AR 46.07 43.15

System Outputs + Detailed Evaluation

The system output from all participants as well as their detailed evaluation results.

User ID Output Evaluation
AMORE UPF AMORE_UPF.out AMORE_UPF.eval
Cheoneum Cheoneum.out Cheoneum.eval
Kampfpudding Kampfpudding.out Kampfpudding.eval
Zuma AR Zuma.out Zuma.eval