Lost alignments for repeating concepts
danielhers opened this issue · 3 comments
When a concept occurs multiple times in the AMR, it is kept as the same Concept object in all triples and in the alignments dictionary. This is a problem, because different occurrences may correspond to different tokens, and this distinction is lost.
For example, the first AMR in the biomedical training set:
# ::id a_pmid_2094_2929.7 ::amr-annotator SDL-AMR-09 ::preferred
# ::tok 1 @- RT @-@ PCR and western blot analyses confirmed the strong up @-@ regulation of serpinE2 expression and secretion by IECs expressing oncogenic MEK , Ras or BRAF .
# ::alignments 0-1.1 2-1.2.1.1.1.1 4-1.2.1.1.1.1 5-1.2 6-1.2.2 7-1.2.2 8-1.2.1 9-1 11-1.3.1.2 12-1.3.1 13-1.3.1 14-1.3.1 15-1.3.1.1.r 16-1.3.1.1.1.1.1 17-1.3.1.1 18-1.3 19-1.3.2 20-1.3.2.1.r 21-1.3.2.1.1.1 22-1.3.2.1.2 23-1.3.2.1.2.1.4 23-1.3.2.1.2.1.4.1.2.1 24-1.3.2.1.2.1.1.1.1 26-1.3.2.1.2.1.2.1.1 27-1.3.2.1.2.1 28-1.3.2.1.2.1.3.1.1
(c / confirm-01~e.9 :li 1~e.0
:ARG0 (a / and~e.5
:op1 (a2 / analyze-01~e.8
:instrument (t / thing
:name (n4 / name :op1 "RT-PCR"~e.2,4)))
:op2 (i / immunoblot-01~e.6,7))
:ARG1 (a4 / and~e.18
:op1 (u / upregulate-01~e.12,13,14
:ARG1~e.15 (e3 / express-03~e.17
:ARG2 (p / protein
:name (n6 / name :op1 "serpinE2"~e.16)))
:ARG1-of (s / strong-02~e.11))
:op2 (s2 / secrete-01~e.19
:ARG0~e.20 (c2 / cell
:name (n7 / name :op1 "IEC"~e.21)
:ARG3-of (e4 / express-03~e.22
:ARG2 (o2 / or~e.27
:op1 (e / enzyme
:name (n2 / name :op1 "MEK"~e.24))
:op2 (e2 / enzyme
:name (n3 / name :op1 "Ras"~e.26))
:op3 (e5 / enzyme
:name (n8 / name :op1 "BRAF"~e.28))
:ARG0-of (c3 / cause-01~e.23
:ARG1 (d / disease :wiki "Cancer"
:name (n / name :op1 "cancer"~e.23))))))
:ARG1 p)))
There are two occurrences of the concept and
, one corresponding to token 5 and one to token 18. However, there is just one Concept object and the alignments dictionary has just the "e.18" one.
The same is true for repeating constants. Example:
# ::id a_pmid_2094_2929.62 ::amr-annotator SDL-AMR-09 ::preferred
# ::tok As shown in Figure <xref ref-type="fig" rid="F2"> 2A </xref> , secreted serpinE2 levels were markedly reduced (> 60 %) in cells @-@ expressing shSerpinE2 ; in contrast , shScrambled had no effect on the secretion of serpinE2 ( data not shown ) .
# ::alignments 1-1.1.5 2-1.1.5.1.r 3-1.1.5.1 5-1.1.5.1.1 8-1.1.1.2 9-1.1.1.1.1.1 10-1.1.1 12-1.1.3 12-1.1.3.r 13-1.1 15-1.1.2.1.1 17-1.1.4.r 18-1.1.4 20-1.1.4.1 24-1.2 26-1.2.1.2.1.1 28-1.2.1.1 28-1.2.1.1.r 29-1.2.1 30-1.2.1.3.r 32-1.2.1.3 33-1.2.1.3.1.r 34-1.2.1.3.1 36-1.2.2.1 37-1.2.2.1.1.1 37-1.2.2.1.1.1.r 38-1.2.2.1.1
(a / and
:op1 (r / reduce-01~e.13
:ARG1 (l / level~e.10
:quant-of (p2 / protein
:name (n / name :op1 "serpinE2"~e.9))
:ARG1-of (s / secrete-01~e.8))
:ARG2 (m2 / more-than
:op1 (p / percentage-entity :value 60~e.15))
:manner~e.12 (m / marked~e.12)
:location~e.17 (c / cell~e.18
:ARG3-of (e2 / express-03~e.20
:ARG2 (n4 / nucleic-acid
:name (n2 / name :op1 "shRNA")
:ARG0-of (e / encode-01
:ARG1 p2))))
:ARG1-of (s3 / show-01~e.1
:ARG0~e.2 (f / figure~e.3 :mod "2A"~e.5)))
:op2 (c2 / contrast-01~e.24
:ARG2 (a2 / affect-01~e.29 :polarity~e.28 -~e.28
:ARG0 (n5 / nucleic-acid
:name (n3 / name :op1 "shScrambled"~e.26))
:ARG1~e.30 (s2 / secrete-01~e.32
:ARG1~e.33 p2~e.34))
:ARG1-of (d / describe-01
:ARG0 (d2 / data~e.36
:ARG1-of (s4 / show-01~e.38 :polarity~e.37 -~e.37)))))
The -
constant occurs twice but only the last is kept.
See my comment on the pull request: why not store the alignment on the variable?
Yes, I guess that would be a better idea (or saving it on the :instance-of
triple as I said there). I'll try it out.