Running TACRED on SPERT
pvcastro opened this issue · 4 comments
Hi there!
I'm trying to evaluate TACRED dataset on SPERT, but I'm getting extremely low results with this dataset. I wrote a script to convert the original TACRED json to the format you use in your training. Here's an example:
TACRED original:
{'id': "e7798fb926b9403cfcd2",
'docid': "APW_ENG_20101103.0539",
'relation': "per:title",
'token': ["At", "the", "same", "time", ",", "Chief", "Financial", "Officer", "Douglas", "Flint", "will",
"become", "chairman", ",", "succeeding", "Stephen", "Green", "who", "is", "leaving", "to", "take",
"a", "government", "job", "." ],
'subj_start': 8,
'subj_end': 9,
'obj_start': 12,
'obj_end': 12,
'subj_type': "PERSON",
'obj_type': "TITLE"}
TACRED converted:
{
"tokens":[ "At", "the", "same", "time", ",", "Chief", "Financial", "Officer", "Douglas", "Flint", "will", "become", "chairman", ",", "succeeding", "Stephen", "Green", "who", "is", "leaving", "to", "take", "a", "government", "job", "." ],
"entities":[
{
"type":"PERSON",
"start":8,
"end":10
},
{
"type":"TITLE",
"start":12,
"end":13
}
],
"relations":[
{
"type":"per:title",
"head":0,
"tail":1
}
],
"orig_id":"e7798fb926b9403cfcd2"
}
I'm getting these results below, using the same config as the conll04 sample, changing only the dataset. Any idea on why the results are so bad? Should I adapt the code somehow?
Thanks!
--- Entities (named entity recognition (NER)) ---
An entity is considered correct if the entity type and span is predicted correctly
type precision recall f1-score support
Loc 18.80 13.44 15.68 677
Tit 17.92 11.93 14.33 1701
url 71.03 79.17 74.88 96
Dat 24.85 28.95 26.74 3064
Crim 22.87 22.05 22.45 195
SoP 22.65 28.54 25.26 431
Cntr 23.39 27.13 25.12 1434
Cit 19.15 21.45 20.24 951
Per 44.15 57.61 49.99 20644
Misc 16.45 14.67 15.51 600
Relig 15.54 19.11 17.14 157
Ideo 11.54 6.12 8.00 49
Dur 25.63 22.56 24.00 359
Org 49.10 50.64 49.86 12272
Num 23.73 29.22 26.19 1742
Nation 19.68 12.53 15.31 495
CoD 20.16 26.33 22.83 395
micro 40.08 46.40 43.01 45262
macro 26.27 27.73 26.68 45262
--- Relations ---
Without named entity classification (NEC)
A relation is considered correct if the relation type and the spans of the two related entities are predicted correctly (entity type is not considered)
/home/pedro/anaconda3/envs/fast-bert/lib/python3.7/site-packages/sklearn/metrics/classification.py:1437: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
type precision recall f1-score support
Cause of Death 6.86 17.86 9.92 168
Country of Headquarters 10.05 11.30 10.64 177
Website 56.25 62.79 59.34 86
Date of Death 13.13 6.31 8.52 206
Charges 5.00 4.76 4.88 105
Countries of Residence 6.96 3.54 4.69 226
Country of Birth 7.50 15.00 10.00 20
Origin 5.80 1.90 2.87 210
Member of 0.00 0.00 0.00 31
Top Members / Employees 14.23 32.21 19.74 534
Person Alternate Names 15.38 5.26 7.84 38
State or Province of Birth 0.00 0.00 0.00 26
State or Provinces of Residence 7.03 12.50 9.00 72
Country of Death 0.00 0.00 0.00 46
State or Province of Headquarters 11.86 30.00 17.00 70
Other Family 1.87 2.50 2.14 80
Person Parents 4.55 5.36 4.92 56
Founded by 25.00 9.21 13.46 76
Age 13.14 18.93 15.51 243
Religion 4.11 5.66 4.76 53
Children 6.12 9.09 7.32 99
Title 7.63 7.40 7.51 919
Dissolved 0.00 0.00 0.00 8
Organization Parents 2.70 7.29 3.94 96
Political/Religious affiliation 4.55 20.00 7.41 10
No Relation 9.53 19.72 12.85 17195
Subsidiaries 4.90 4.42 4.65 113
Employee Of 2.07 1.87 1.96 375
State or Province of Death 19.05 9.76 12.90 41
Siblings 6.02 16.67 8.85 30
Shareholders 6.06 3.64 4.55 55
Cities of Residence 7.94 5.59 6.56 179
City of Death 4.11 2.54 3.14 118
City of Headquarters 4.64 6.42 5.38 109
Schools Attended 12.82 10.00 11.24 50
Date of Birth 12.24 19.35 15.00 31
Founded 11.63 13.16 12.35 38
Members 0.00 0.00 0.00 85
Spouse 7.76 10.69 8.99 159
City of Birth 7.69 9.09 8.33 33
Organization Alternate Names 22.30 18.93 20.48 338
Number of Employees/Members 13.21 25.93 17.50 27
micro 9.53 17.80 12.42 22631
macro 9.09 11.11 9.19 22631
With named entity classification (NEC)
A relation is considered correct if the relation type and the two related entities are predicted correctly (in span and entity type)
type precision recall f1-score support
Cause of Death 6.86 17.86 9.92 168
Country of Headquarters 10.05 11.30 10.64 177
Website 56.25 62.79 59.34 86
Date of Death 13.13 6.31 8.52 206
Charges 5.00 4.76 4.88 105
Countries of Residence 5.22 2.65 3.52 226
Country of Birth 7.50 15.00 10.00 20
Origin 5.80 1.90 2.87 210
Member of 0.00 0.00 0.00 31
Top Members / Employees 13.98 31.65 19.39 534
Person Alternate Names 15.38 5.26 7.84 38
State or Province of Birth 0.00 0.00 0.00 26
State or Provinces of Residence 7.03 12.50 9.00 72
Country of Death 0.00 0.00 0.00 46
State or Province of Headquarters 11.86 30.00 17.00 70
Other Family 1.87 2.50 2.14 80
Person Parents 4.55 5.36 4.92 56
Founded by 25.00 9.21 13.46 76
Age 13.14 18.93 15.51 243
Religion 4.11 5.66 4.76 53
Children 6.12 9.09 7.32 99
Title 7.63 7.40 7.51 919
Dissolved 0.00 0.00 0.00 8
Organization Parents 2.70 7.29 3.94 96
Political/Religious affiliation 4.55 20.00 7.41 10
No Relation 9.01 18.64 12.15 17195
Subsidiaries 3.92 3.54 3.72 113
Employee Of 2.07 1.87 1.96 375
State or Province of Death 19.05 9.76 12.90 41
Siblings 6.02 16.67 8.85 30
Shareholders 6.06 3.64 4.55 55
Cities of Residence 7.14 5.03 5.90 179
City of Death 4.11 2.54 3.14 118
City of Headquarters 4.64 6.42 5.38 109
Schools Attended 12.82 10.00 11.24 50
Date of Birth 12.24 19.35 15.00 31
Founded 11.63 13.16 12.35 38
Members 0.00 0.00 0.00 85
Spouse 7.76 10.69 8.99 159
City of Birth 5.13 6.06 5.56 33
Organization Alternate Names 22.30 18.93 20.48 338
Number of Employees/Members 13.21 25.93 17.50 27
micro 9.07 16.95 11.82 22631
macro 8.93 10.94 9.04 22631
Hi,
at first glance TACRED is not suited for joint entity and relation extraction since it is not exhaustively annotated (each sentence is annotated with only a single relation between two entities). In your example, there is another entity "Stephen Green", which is not annotated. While you could (mostly) infer the other entities from the "stanford_ner" column in the TACRED JSON file, relations between these entities are still missing. This leads to SpERT choosing positive samples as negative samples during training and also false positives during evaluation (i.e. by predicting correct entities/relations that aren't annotated).
Besides that, you should remove relations tagged with "no_relation" since SpERT uses all entity pairs not labeled with a relation as negative samples (a "None" relation class is added during training).
Also many (according to the "no_relation" support set in the table above) sentences in the TACRED dataset are not labeled with any relation, which may lead to a huge imbalance between positive/negative samples. Maybe only training on sentences that contain a relation leads to better results, but still, TACRED seems to not be suited for that kind of task.
I see, thanks for clarifying @markus-eberts!
I'll try running SpERT for relations that span across different sentences. In general, they are small and close enough to respect the max_seq_len of 512 from BERT. Do you think SpERT is suitable for this scenario?
Thanks!
This is really hard to tell and depends on the dataset. SpERT was designed for inner sentence relation extraction. For relations spanning multiple sentences, coreference resolution may also be a requirement. I think it is worth a try, but the model may need some modifications.
Great, thanks!