Prepositions unannotated for supersense
lgessler opened this issue · 3 comments
lgessler commented
Token 6:
# sent_id = french-f57dd6ab-5263-4c8a-e360-8ec683e6a37a-02
# text = Once you have the hang of it it s pretty fast ( and does n't eat your clutch ) .
1 Once once SCONJ IN _ 3 mark _ _ _ _ _ _ _ _ _ _ _
2 you you PRON PRP _ 3 nsubj _ _ _ _ _ _ _ _ _ _ _
3 have have VERB VBP _ 11 advcl _ _ _ _ _ _ _ _ _ _ _
4 the the DET DT _ 5 det _ _ _ _ _ _ _ _ _ _ _
5 hang hang NOUN NN _ 3 obj _ _ _ _ _ _ _ _ _ _ _
6 of of ADP IN _ 7 case _ _ _ _ _ _ _ _ _ _ _
7 it it PRON PRP _ 5 nmod _ _ _ _ _ _ _ _ _ _ _
8 it it PRON PRP _ 11 nsubj _ _ _ _ _ _ _ _ _ _ _
9 s be AUX VBZ _ 11 cop _ _ _ _ _ _ _ _ _ _ _
10 pretty pretty ADV RB _ 11 advmod _ _ _ _ _ _ _ _ _ _ _
11 fast fast ADJ JJ _ 0 root _ _ _ _ _ _ _ _ _ _ _
12 ( ( PUNCT -LRB- _ 16 punct _ _ _ _ _ _ _ _ _ _ _
13 and and CCONJ CC _ 16 cc _ _ _ _ _ _ _ _ _ _ _
14 does do AUX VBZ _ 16 aux _ _ _ _ _ _ _ _ _ _ _
15 n't not PART RB _ 16 advmod _ _ _ _ _ _ _ _ _ _ _
16 eat eat VERB VB _ 11 conj _ _ _ _ _ _ _ _ _ _ _
17 your you PRON PRP$ _ 18 nmod:poss _ _ _ _ _ Possessor Possessor _ _ _ _
18 clutch clutch NOUN NN _ 16 obj _ _ _ _ _ _ _ _ _ _ _
19 ) ) PUNCT -RRB- _ 11 punct _ _ _ _ _ _ _ _ _ _ _
20 . . PUNCT . _ 11 punct _ _ _ _ _ _ _ _ _ _ _
I assumed that all preps were supposed to be annotated, but perhaps not?
nschneid commented
This should be annotated. p.Gestalt
is the best I can think of.
lgessler commented
There appear to be a lot of such cases that are not annotated for supersense. Looking at tokens that have UPOS tag ADP
, XPOS tag 'IN', and an empty SS and SMWE column:
L331 (french-f57dd6ab-5263-4c8a-e360-8ec683e6a37a-02): untagged adposition
L787 (english-675d0855-2d55-90c9-0576-598359cd012a-02): untagged adposition
L1261 (german-fdfb0286-16dc-f265-210a-e8a1892739f6-16): untagged adposition
L1333 (english-764ab510-9acb-a588-8397-d4127118bb71-02): untagged adposition
L1350 (english-764ab510-9acb-a588-8397-d4127118bb71-03): untagged adposition
L1453 (french-f3ff1b4d-9a8b-0920-a38d-1eadcca4b02b-02): untagged adposition
L1842 (spanish-4ca44e40-943e-5a22-48d4-a701f3d13a7b-02): untagged adposition
L2479 (spanish-022ee17b-a59d-43d6-65c9-90b2acb26b87-13): untagged adposition
L2878 (spanish-022ee17b-a59d-43d6-65c9-90b2acb26b87-23): untagged adposition
L3543 (french-5ec1c3d1-eefb-98a9-46ac-0cee6c1a7ddc-02): untagged adposition
L3814 (german-73a695f2-7da5-6266-bff7-03ce8613b181-10): untagged adposition
L3815 (german-73a695f2-7da5-6266-bff7-03ce8613b181-10): untagged adposition
L3852 (german-73a695f2-7da5-6266-bff7-03ce8613b181-11): untagged adposition
L3976 (german-73a695f2-7da5-6266-bff7-03ce8613b181-16): untagged adposition
L4229 (english-417df4f8-c1a7-de00-9ec1-2fa2e6df12ce-04): untagged adposition
L5918 (french-47473c91-d121-950d-7ed2-24ed3362591e-03): untagged adposition
L6781 (french-23ec7597-9615-a32e-71b5-ff55819e0387-03): untagged adposition
L6819 (german-cfa8fd9b-a8c2-9379-c816-a94b0e42b253-01): untagged adposition
L6977 (french-6db076bd-5980-5a17-2cde-ec51972ab68b-01): untagged adposition
L7455 (french-5b177a38-a96d-fb2c-780b-25d9b02d3504-01): untagged adposition
L7485 (french-5b177a38-a96d-fb2c-780b-25d9b02d3504-02): untagged adposition
L7628 (english-f16f1487-f7f2-1e5c-ce07-55e354565b12-03): untagged adposition
L7679 (english-e3e70682-c209-4cac-629f-6fbed82c07cd-03): untagged adposition
L8023 (french-d994539a-3dc0-7bbb-ed6e-472512fca4ed-02): untagged adposition
L10174 (english-38553ac8-7d71-8591-3fee-2fc72e2dffdf-04): untagged adposition
L10267 (spanish-e86a5da6-5f5e-ef2e-7072-ef7b6625afeb-01): untagged adposition
L10419 (spanish-e86a5da6-5f5e-ef2e-7072-ef7b6625afeb-07): untagged adposition
L10429 (spanish-e86a5da6-5f5e-ef2e-7072-ef7b6625afeb-07): untagged adposition
L11250 (german-2809cc89-3b4b-ee51-6509-36624bf8b43a-03): untagged adposition
L11388 (english-5df2e073-53f0-4099-813d-d49a8d99743c-04): untagged adposition
L11593 (french-9be3cecb-8c49-7c68-a8c2-4d4244ef7feb-06): untagged adposition
L11598 (french-9be3cecb-8c49-7c68-a8c2-4d4244ef7feb-06): untagged adposition
L11836 (german-7d8dd474-c146-a389-381b-37241398aa12-02): untagged adposition
L12253 (english-6522ff46-74b7-061c-09ce-15a8afd4fc7b-02): untagged adposition
L12390 (spanish-c55fd024-99dd-8bfa-b3d7-77e95bc2b64f-01): untagged adposition
L12492 (spanish-c55fd024-99dd-8bfa-b3d7-77e95bc2b64f-05): untagged adposition
L12533 (spanish-c55fd024-99dd-8bfa-b3d7-77e95bc2b64f-06): untagged adposition
L13447 (german-5d73b377-922e-ed65-0921-fa874e0abf27-02): untagged adposition
L13459 (german-5d73b377-922e-ed65-0921-fa874e0abf27-02): untagged adposition
L13495 (german-5d73b377-922e-ed65-0921-fa874e0abf27-03): untagged adposition
L13784 (german-c85e348a-476c-52e4-b1b4-ff82e4a8836f-03): untagged adposition
L14784 (english-c5826444-79fa-f501-2287-401ba7bbaa76-06): untagged adposition
L15442 (french-eaff520b-49db-5c12-d0a0-1524cc4145bf-01): untagged adposition
L15895 (german-c2b1c26b-e814-7dc9-af47-9f2936631b3e-07): untagged adposition
L15907 (german-c2b1c26b-e814-7dc9-af47-9f2936631b3e-07): untagged adposition
L17234 (german-38ebbccb-ec0a-a1f2-d064-944c27c71e63-06): untagged adposition
L17237 (german-38ebbccb-ec0a-a1f2-d064-944c27c71e63-06): untagged adposition
L17404 (french-759aaeee-4311-62a4-f20a-b3059b33d947-01): untagged adposition
L17476 (french-c96fa758-02b0-87f8-06fa-adb10a248cff-01): untagged adposition
L17662 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-07): untagged adposition
L17838 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-14): untagged adposition
L17849 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-14): untagged adposition
L17864 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-15): untagged adposition
L18076 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-29): untagged adposition
L18671 (english-908e0372-bcbe-9a42-f89d-3fda1e454241-03): untagged adposition
L18727 (english-908e0372-bcbe-9a42-f89d-3fda1e454241-04): untagged adposition
L19286 (german-a5c98426-9f68-793e-5634-a8b2610ec605-07): untagged adposition
L19569 (french-9177206d-fbce-c1bd-c86d-dbce4559eb49-04): untagged adposition
L20165 (french-2dd5ad98-475c-61b1-af3e-f55c43ecf2b9-01): untagged adposition
L20244 (french-b22cc347-4ca2-7b41-f5e4-c4bb30942540-02): untagged adposition
L21005 (english-e32866d3-0d6a-78b0-7eda-9ab9bec60ffe-01): untagged adposition
L21236 (french-a2a7ae1f-3ac7-652c-cdf8-440407295e42-05): untagged adposition
L21238 (french-a2a7ae1f-3ac7-652c-cdf8-440407295e42-05): untagged adposition
L21554 (german-86e7b795-7610-2737-745b-9fc4193c0361-07): untagged adposition
L21555 (german-86e7b795-7610-2737-745b-9fc4193c0361-07): untagged adposition
L21823 (german-bd143fa9-b714-210c-665d-7435c1066932-01): untagged adposition
L22175 (french-642a357c-7329-02f4-51fb-fcc798b8da9f-06): untagged adposition
L22464 (german-72288559-a5c6-c1c0-de55-b877cc95aff9-01): untagged adposition
L23493 (english-d138d150-8557-716a-a750-2a812227d96d-03): untagged adposition
L24010 (german-f3a7a81d-1965-5597-d014-2891a5112fd2-01): untagged adposition
L24019 (german-f3a7a81d-1965-5597-d014-2891a5112fd2-01): untagged adposition
L24380 (german-25482047-59d8-2aaa-da4b-25cd20b9e029-02): untagged adposition
L24407 (german-25482047-59d8-2aaa-da4b-25cd20b9e029-04): untagged adposition
L24646 (german-25482047-59d8-2aaa-da4b-25cd20b9e029-09): untagged adposition
L24765 (spanish-2d6b76db-51ed-2f15-99f8-eee797b9580f-04): untagged adposition
L24791 (french-449c4ca2-3685-156b-89c8-0c4de9367ed9-02): untagged adposition
L24941 (spanish-cd0dfe6b-d673-4d3d-b898-f2f2d412c848-01): untagged adposition
L25017 (spanish-cd0dfe6b-d673-4d3d-b898-f2f2d412c848-04): untagged adposition
L25131 (english-55c36c3d-5cbb-c080-3547-5c5ef76dce6e-01): untagged adposition
L25146 (english-55c36c3d-5cbb-c080-3547-5c5ef76dce6e-02): untagged adposition
L25222 (english-55c36c3d-5cbb-c080-3547-5c5ef76dce6e-06): untagged adposition
Some samples--some appear to be false positives like this ("Down" is incorrectly tagged ADP
/IN
):
15 a a DET DT _ 16 det _ _ _ _ _ _ _ _ _ _ _
16 kid kid NOUN NN _ 14 obj _ _ _ _ _ _ _ _ _ _ _
17 with with ADP IN _ 18 case _ _ _ _ _ Characteristic Characteristic _ _ _ _
> 18 Down down ADP IN _ 14 obl _ _ _ _ _ _ _ _ _ _ _
19 s 's PART POS _ 18 case _ _ _ _ _ `$ `$ _ _ _ _
20 . . PUNCT . _ 4 punct _ _ _ _ _ _ _ _ _ _ _
Others seem to genuinely lack an annotation:
10 would would AUX MD _ 13 aux _ _ _ _ _ _ _ _ _ _ _
11 be be AUX VB _ 13 cop _ _ _ _ _ _ _ _ _ _ _
12 a a DET DT _ 13 det _ _ _ _ _ _ _ _ _ _ _
13 world world NOUN NN _ 0 root _ _ _ _ _ _ _ _ _ _ _
14 devoid devoid ADJ JJ _ 13 amod _ _ _ _ _ _ _ _ _ _ _
> 15 of of ADP IN _ 18 case _ _ _ _ _ _ _ _ _ _ _
16 nation nation NOUN NN _ 18 compound _ _ _ _ _ _ _ _ _ _ _
22 and and CCONJ CC _ 25 cc _ _ _ _ _ _ _ _ _ _ _
23 did do AUX VBD _ 25 aux _ _ _ _ _ _ _ _ _ _ _
24 not not PART RB _ 25 advmod _ _ _ _ _ _ _ _ _ _ _
25 account account VERB VB _ 17 conj _ _ _ _ _ _ _ _ _ _ _
> 26 for for ADP IN _ 27 case _ _ _ _ _ _ _ _ _ _ _
27 Vriska Vriska PROPN NNP _ 25 obl _ _ _ _ _ _ _ _ _ _ _
28 . . PUNCT . _ 11 punct _ _ _ _ _ _ _ _ _ _ _
1 I I PRON PRP _ 2 nsubj _ _ _ _ _ _ _ _ _ _ _
2 base base VERB VBP _ 0 root _ _ _ _ _ _ _ _ _ _ _
3 this this PRON DT _ 2 obj _ _ _ _ _ _ _ _ _ _ _
4 simply simply ADV RB _ 2 advmod _ _ _ _ _ _ _ _ _ _ _
> 5 on on ADP IN _ 7 case _ _ _ _ _ _ _ _ _ _ _
6 the the DET DT _ 7 det _ _ _ _ _ _ _ _ _ _ _
7 fact fact NOUN NN _ 2 obl _ _ _ _ _ _ _ _ _ _ _
8 that that SCONJ IN _ 14 mark _ _ _ _ _ _ _ _ _ _ _
lgessler commented
cc @mkranzlein