Mapping between Apertium's morphosyntactic tags and the LexInfo Ontology.
Source data of Apertium: https://github.com/apertium/apertium-trunk
Steps:
-
The list of tags extracted from Apertium data was taken as basis for the mapping. This extraction was performed at the Applied Computational Linguistics Group (ACoLi) at Goethe University Frankfurkt, see : https://github.com/acoli-repo/acoli-dicts/blob/master/stable/apertium
-
Individual by individual (or tag by tag), by relying on the
rdf:label
(s), a match in LexInfo was manually searched. See below for details on this.Update July 2020: manual addition of tags to this initial list of mappings, on the basis of Apertium's List of Symbols: https://wiki.apertium.org/wiki/List_of_symbols
-
This CVS/TSV does not explicitly provide equivalence relations between Apertium individuals and LexInfo individuals, but triple replacement instructions to apply to an intermediate shallow RDF conversion (see Donandt, K., and Chiarcos, C. (2019)). Sometimes an Apertium individual will indeed be replaced by its LexInfo individual (if available), but this does not hold for all rows.
Apertium individuals occur as object of lexinfo:morphosyntacticProperty
in the intermediate RDF. The CVS provides "predicate - object" pairs for each of those Apertium tags acting as object. That is, if a row of the the CVS reads apertium:acc, lexinfo:case, lexinfo:accusativeCase
, this row is intended to guide this update:
## Intermediate RDF
?subject lexinfo:morphosyntacticProperty apertium:acc .
## Expected result after update
?subject lexinfo:case lexinfo:accusativeCase .
Sometimes a single tag in Apertium is mapped to different ones in LexInfo (the tag "bundles" a set of features). This is indicated in the CVS just by adding different rows with the same Apertium tag and several mappings to LexInfo. For example, here we have a tag for a third person singular possesive suffix:
apertium:PxSg3, lexinfo:person, lexinfo:thirdPerson
apertium:PxSg3, lexinfo:number, lexinfo:singular
apertium:PxSg3, lexinfo:referentType, lexinfo:possessive
apertium:PxSg3, lexinfo:termElement, lexinfo:suffix
When no 1:1 mapping is available, there are three scenarios:
- No documentation for the tag: the same predicate (
lexinfo:morphosyntacticProperty
) and the same individual (the Apertium individual) are left unchanged. This is the reason why some rows have the same value in the first and third column (apertium:ADB, lexinfo:morphosyntacticProperty, apertium:ADB
) in the CVS.
INPUT NEEDED to map those tags!
- There is a mismatch in granularity with the potential LexInfo individual: for example,
apertium:Comp
is a very specific comparative, and in LexInfo we havelexinfo:comparative
, with broader semantics. To preserve the specific comparative information without losing the link to LexInfo, we keep theapertium:Comp
value in a triple, and add the LexInfolexinfo:comparative
individual as well. - Mapping the tag to LexInfo would require to turn to a module of OntoLex-lemon to properly introduce the link. This only happens with syntactic frames in this list of tags.
For each POS tag, the following table provides ...
- The lexinfo individual(s) that are instantiated throughout the data to encode that pos
- The list of Apertium tags (varying in granularity) for that pos in the data
- The suggested pos abbreviation to use in
ontolex:LexicalEntry
URIs, based on the Universal Dependencies tagset.
POS | Lexinfo Individual | Apertium tag | Abbreviation for URI (UD-based) |
---|---|---|---|
adjective | adjective, presentParticipleAdjective, | A, Adj, ADJ, Der_las, pprs, short, sint | adj |
adposition | adposition, postposition, preposition | Adp, ADP, Po, post, pr, Pr, prep, Rabl, Racc, Rdat, Rgen, Rins, Rloc, Rnom | adp |
adverb | adverb | adv, Adv,gna | adv |
proper noun | properNoun | ant, hyd, np, org, Org, pat, Prop, top | propn |
punctuation | punctuation, comma | apos, comma, dash, guio, lquot, punct, quot, quote, rquot | punct |
determiner | article, determiner, demonstrativeDeterminer | art, det, DET, detNT, detnt, dst, prx | det |
conjunction | conjunction, coordinatingConjunction | cnj,CC | conj |
noun | noun | cog, Der_eapmi, Der_muš, Der_vuohta, G3, n, N, nn, Plc, subs | noun |
verb | copula, verb | cop, Neg, sep, v, V, vbavea, vbdo, vbhaver, vblex, vbloc, vbser, VGen | verb |
auxiliary verb | verb, modal | vaux, mod, vbmod | aux |
particle | particle | emph, mod_ass, mod_ind, Pcle, qst, Qst, vpart | particle |
interjection | interjection | ij, interj | intj |
symbol | openParenthesis, questionMark, closeParenthesis | lpar, lquest, rpar | symb |
numeral | numeral | num, Num | num |
pronoun | personalPronoun, pronoun, reflexivePersonalPronoun, relativePronoun, reciprocalPronoun | pers, Pers, prn, pron, Pron, ref, rel, res | pron |
Contributors:
Julia Bosque-Gil (University of Zaragoza): Mapping Apertium-Lexinfo
Christian Chiarcos, Maxim Ionov (Goethe Universität Frankfurt): Extraction of tags from Apertium data https://github.com/acoli-repo/acoli-dicts/blob/master/stable/apertium/apertium.ttl
Original set of tags used in the Apertium family of dictionaries: https://github.com/apertium/apertium-trunk
References:
Gracia, J., Villegas, M., Gomez-Perez, A., & Bel, N. (2018). The apertium bilingual dictionaries on the web of data. Semantic Web, 9(2), 231-240.
Donandt, K., & Chiarcos, C. (2019). Translation inference through multi-lingual word embedding similarity. In Proc. of TIAD-2019 Shared Task Translation Inference Across Dictionaries, at 2nd Language Data and Knowledge (LDK) conference. CEUR-WS.