nert-nlp/pastrie

Prepositions unannotated for supersense

lgessler opened this issue · 3 comments

Token 6:

# sent_id = french-f57dd6ab-5263-4c8a-e360-8ec683e6a37a-02
# text = Once you have the hang of it it s pretty fast ( and does n't eat your clutch ) .
1	Once	once	SCONJ	IN	_	3	mark	_	_	_	_	_	_	_	_	_	_	_
2	you	you	PRON	PRP	_	3	nsubj	_	_	_	_	_	_	_	_	_	_	_
3	have	have	VERB	VBP	_	11	advcl	_	_	_	_	_	_	_	_	_	_	_
4	the	the	DET	DT	_	5	det	_	_	_	_	_	_	_	_	_	_	_
5	hang	hang	NOUN	NN	_	3	obj	_	_	_	_	_	_	_	_	_	_	_
6	of	of	ADP	IN	_	7	case	_	_	_	_	_	_	_	_	_	_	_
7	it	it	PRON	PRP	_	5	nmod	_	_	_	_	_	_	_	_	_	_	_
8	it	it	PRON	PRP	_	11	nsubj	_	_	_	_	_	_	_	_	_	_	_
9	s	be	AUX	VBZ	_	11	cop	_	_	_	_	_	_	_	_	_	_	_
10	pretty	pretty	ADV	RB	_	11	advmod	_	_	_	_	_	_	_	_	_	_	_
11	fast	fast	ADJ	JJ	_	0	root	_	_	_	_	_	_	_	_	_	_	_
12	(	(	PUNCT	-LRB-	_	16	punct	_	_	_	_	_	_	_	_	_	_	_
13	and	and	CCONJ	CC	_	16	cc	_	_	_	_	_	_	_	_	_	_	_
14	does	do	AUX	VBZ	_	16	aux	_	_	_	_	_	_	_	_	_	_	_
15	n't	not	PART	RB	_	16	advmod	_	_	_	_	_	_	_	_	_	_	_
16	eat	eat	VERB	VB	_	11	conj	_	_	_	_	_	_	_	_	_	_	_
17	your	you	PRON	PRP$	_	18	nmod:poss	_	_	_	_	_	Possessor	Possessor	_	_	_	_
18	clutch	clutch	NOUN	NN	_	16	obj	_	_	_	_	_	_	_	_	_	_	_
19	)	)	PUNCT	-RRB-	_	11	punct	_	_	_	_	_	_	_	_	_	_	_
20	.	.	PUNCT	.	_	11	punct	_	_	_	_	_	_	_	_	_	_	_

I assumed that all preps were supposed to be annotated, but perhaps not?

This should be annotated. p.Gestalt is the best I can think of.

There appear to be a lot of such cases that are not annotated for supersense. Looking at tokens that have UPOS tag ADP, XPOS tag 'IN', and an empty SS and SMWE column:

L331 (french-f57dd6ab-5263-4c8a-e360-8ec683e6a37a-02): untagged adposition
L787 (english-675d0855-2d55-90c9-0576-598359cd012a-02): untagged adposition
L1261 (german-fdfb0286-16dc-f265-210a-e8a1892739f6-16): untagged adposition
L1333 (english-764ab510-9acb-a588-8397-d4127118bb71-02): untagged adposition
L1350 (english-764ab510-9acb-a588-8397-d4127118bb71-03): untagged adposition
L1453 (french-f3ff1b4d-9a8b-0920-a38d-1eadcca4b02b-02): untagged adposition
L1842 (spanish-4ca44e40-943e-5a22-48d4-a701f3d13a7b-02): untagged adposition
L2479 (spanish-022ee17b-a59d-43d6-65c9-90b2acb26b87-13): untagged adposition
L2878 (spanish-022ee17b-a59d-43d6-65c9-90b2acb26b87-23): untagged adposition
L3543 (french-5ec1c3d1-eefb-98a9-46ac-0cee6c1a7ddc-02): untagged adposition
L3814 (german-73a695f2-7da5-6266-bff7-03ce8613b181-10): untagged adposition
L3815 (german-73a695f2-7da5-6266-bff7-03ce8613b181-10): untagged adposition
L3852 (german-73a695f2-7da5-6266-bff7-03ce8613b181-11): untagged adposition
L3976 (german-73a695f2-7da5-6266-bff7-03ce8613b181-16): untagged adposition
L4229 (english-417df4f8-c1a7-de00-9ec1-2fa2e6df12ce-04): untagged adposition
L5918 (french-47473c91-d121-950d-7ed2-24ed3362591e-03): untagged adposition
L6781 (french-23ec7597-9615-a32e-71b5-ff55819e0387-03): untagged adposition
L6819 (german-cfa8fd9b-a8c2-9379-c816-a94b0e42b253-01): untagged adposition
L6977 (french-6db076bd-5980-5a17-2cde-ec51972ab68b-01): untagged adposition
L7455 (french-5b177a38-a96d-fb2c-780b-25d9b02d3504-01): untagged adposition
L7485 (french-5b177a38-a96d-fb2c-780b-25d9b02d3504-02): untagged adposition
L7628 (english-f16f1487-f7f2-1e5c-ce07-55e354565b12-03): untagged adposition
L7679 (english-e3e70682-c209-4cac-629f-6fbed82c07cd-03): untagged adposition
L8023 (french-d994539a-3dc0-7bbb-ed6e-472512fca4ed-02): untagged adposition
L10174 (english-38553ac8-7d71-8591-3fee-2fc72e2dffdf-04): untagged adposition
L10267 (spanish-e86a5da6-5f5e-ef2e-7072-ef7b6625afeb-01): untagged adposition
L10419 (spanish-e86a5da6-5f5e-ef2e-7072-ef7b6625afeb-07): untagged adposition
L10429 (spanish-e86a5da6-5f5e-ef2e-7072-ef7b6625afeb-07): untagged adposition
L11250 (german-2809cc89-3b4b-ee51-6509-36624bf8b43a-03): untagged adposition
L11388 (english-5df2e073-53f0-4099-813d-d49a8d99743c-04): untagged adposition
L11593 (french-9be3cecb-8c49-7c68-a8c2-4d4244ef7feb-06): untagged adposition
L11598 (french-9be3cecb-8c49-7c68-a8c2-4d4244ef7feb-06): untagged adposition
L11836 (german-7d8dd474-c146-a389-381b-37241398aa12-02): untagged adposition
L12253 (english-6522ff46-74b7-061c-09ce-15a8afd4fc7b-02): untagged adposition
L12390 (spanish-c55fd024-99dd-8bfa-b3d7-77e95bc2b64f-01): untagged adposition
L12492 (spanish-c55fd024-99dd-8bfa-b3d7-77e95bc2b64f-05): untagged adposition
L12533 (spanish-c55fd024-99dd-8bfa-b3d7-77e95bc2b64f-06): untagged adposition
L13447 (german-5d73b377-922e-ed65-0921-fa874e0abf27-02): untagged adposition
L13459 (german-5d73b377-922e-ed65-0921-fa874e0abf27-02): untagged adposition
L13495 (german-5d73b377-922e-ed65-0921-fa874e0abf27-03): untagged adposition
L13784 (german-c85e348a-476c-52e4-b1b4-ff82e4a8836f-03): untagged adposition
L14784 (english-c5826444-79fa-f501-2287-401ba7bbaa76-06): untagged adposition
L15442 (french-eaff520b-49db-5c12-d0a0-1524cc4145bf-01): untagged adposition
L15895 (german-c2b1c26b-e814-7dc9-af47-9f2936631b3e-07): untagged adposition
L15907 (german-c2b1c26b-e814-7dc9-af47-9f2936631b3e-07): untagged adposition
L17234 (german-38ebbccb-ec0a-a1f2-d064-944c27c71e63-06): untagged adposition
L17237 (german-38ebbccb-ec0a-a1f2-d064-944c27c71e63-06): untagged adposition
L17404 (french-759aaeee-4311-62a4-f20a-b3059b33d947-01): untagged adposition
L17476 (french-c96fa758-02b0-87f8-06fa-adb10a248cff-01): untagged adposition
L17662 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-07): untagged adposition
L17838 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-14): untagged adposition
L17849 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-14): untagged adposition
L17864 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-15): untagged adposition
L18076 (german-e0fc66c4-0bd0-f7fd-73e0-9d7886571ddb-29): untagged adposition
L18671 (english-908e0372-bcbe-9a42-f89d-3fda1e454241-03): untagged adposition
L18727 (english-908e0372-bcbe-9a42-f89d-3fda1e454241-04): untagged adposition
L19286 (german-a5c98426-9f68-793e-5634-a8b2610ec605-07): untagged adposition
L19569 (french-9177206d-fbce-c1bd-c86d-dbce4559eb49-04): untagged adposition
L20165 (french-2dd5ad98-475c-61b1-af3e-f55c43ecf2b9-01): untagged adposition
L20244 (french-b22cc347-4ca2-7b41-f5e4-c4bb30942540-02): untagged adposition
L21005 (english-e32866d3-0d6a-78b0-7eda-9ab9bec60ffe-01): untagged adposition
L21236 (french-a2a7ae1f-3ac7-652c-cdf8-440407295e42-05): untagged adposition
L21238 (french-a2a7ae1f-3ac7-652c-cdf8-440407295e42-05): untagged adposition
L21554 (german-86e7b795-7610-2737-745b-9fc4193c0361-07): untagged adposition
L21555 (german-86e7b795-7610-2737-745b-9fc4193c0361-07): untagged adposition
L21823 (german-bd143fa9-b714-210c-665d-7435c1066932-01): untagged adposition
L22175 (french-642a357c-7329-02f4-51fb-fcc798b8da9f-06): untagged adposition
L22464 (german-72288559-a5c6-c1c0-de55-b877cc95aff9-01): untagged adposition
L23493 (english-d138d150-8557-716a-a750-2a812227d96d-03): untagged adposition
L24010 (german-f3a7a81d-1965-5597-d014-2891a5112fd2-01): untagged adposition
L24019 (german-f3a7a81d-1965-5597-d014-2891a5112fd2-01): untagged adposition
L24380 (german-25482047-59d8-2aaa-da4b-25cd20b9e029-02): untagged adposition
L24407 (german-25482047-59d8-2aaa-da4b-25cd20b9e029-04): untagged adposition
L24646 (german-25482047-59d8-2aaa-da4b-25cd20b9e029-09): untagged adposition
L24765 (spanish-2d6b76db-51ed-2f15-99f8-eee797b9580f-04): untagged adposition
L24791 (french-449c4ca2-3685-156b-89c8-0c4de9367ed9-02): untagged adposition
L24941 (spanish-cd0dfe6b-d673-4d3d-b898-f2f2d412c848-01): untagged adposition
L25017 (spanish-cd0dfe6b-d673-4d3d-b898-f2f2d412c848-04): untagged adposition
L25131 (english-55c36c3d-5cbb-c080-3547-5c5ef76dce6e-01): untagged adposition
L25146 (english-55c36c3d-5cbb-c080-3547-5c5ef76dce6e-02): untagged adposition
L25222 (english-55c36c3d-5cbb-c080-3547-5c5ef76dce6e-06): untagged adposition

Some samples--some appear to be false positives like this ("Down" is incorrectly tagged ADP/IN):

  15  a   a   DET DT  _   16  det _   _   _   _   _   _   _   _   _   _   _
  16  kid kid NOUN    NN  _   14  obj _   _   _   _   _   _   _   _   _   _   _
  17  with    with    ADP IN  _   18  case    _   _   _   _   _   Characteristic  Characteristic  _   _   _   _
> 18  Down    down    ADP IN  _   14  obl _   _   _   _   _   _   _   _   _   _   _
  19  s   's  PART    POS _   18  case    _   _   _   _   _   `$  `$  _   _   _   _
  20  .   .   PUNCT   .   _   4   punct   _   _   _   _   _   _   _   _   _   _   _

Others seem to genuinely lack an annotation:

  10  would   would   AUX MD  _   13  aux _   _   _   _   _   _   _   _   _   _   _
  11  be  be  AUX VB  _   13  cop _   _   _   _   _   _   _   _   _   _   _
  12  a   a   DET DT  _   13  det _   _   _   _   _   _   _   _   _   _   _
  13  world   world   NOUN    NN  _   0   root    _   _   _   _   _   _   _   _   _   _   _
  14  devoid  devoid  ADJ JJ  _   13  amod    _   _   _   _   _   _   _   _   _   _   _
> 15  of  of  ADP IN  _   18  case    _   _   _   _   _   _   _   _   _   _   _
  16  nation  nation  NOUN    NN  _   18  compound    _   _   _   _   _   _   _   _   _   _   _
  22  and and CCONJ   CC  _   25  cc  _   _   _   _   _   _   _   _   _   _   _
  23  did do  AUX VBD _   25  aux _   _   _   _   _   _   _   _   _   _   _
  24  not not PART    RB  _   25  advmod  _   _   _   _   _   _   _   _   _   _   _
  25  account account VERB    VB  _   17  conj    _   _   _   _   _   _   _   _   _   _   _
> 26  for for ADP IN  _   27  case    _   _   _   _   _   _   _   _   _   _   _
  27  Vriska  Vriska  PROPN   NNP _   25  obl _   _   _   _   _   _   _   _   _   _   _
  28  .   .   PUNCT   .   _   11  punct   _   _   _   _   _   _   _   _   _   _   _
  1   I   I   PRON    PRP _   2   nsubj   _   _   _   _   _   _   _   _   _   _   _
  2   base    base    VERB    VBP _   0   root    _   _   _   _   _   _   _   _   _   _   _
  3   this    this    PRON    DT  _   2   obj _   _   _   _   _   _   _   _   _   _   _
  4   simply  simply  ADV RB  _   2   advmod  _   _   _   _   _   _   _   _   _   _   _
> 5   on  on  ADP IN  _   7   case    _   _   _   _   _   _   _   _   _   _   _
  6   the the DET DT  _   7   det _   _   _   _   _   _   _   _   _   _   _
  7   fact    fact    NOUN    NN  _   2   obl _   _   _   _   _   _   _   _   _   _   _
  8   that    that    SCONJ   IN  _   14  mark    _   _   _   _   _   _   _   _   _   _   _