

Opened this issue · 3 comments

How to deal with the contractions?! @leoalenc reported two approaches in the literature. Maybe he can add pointers here.

preposition + article

deixamos os livros nas [= em as] prateleiras
deixamos os livros em casa


compra-nos um livro
nos compraram um livro

    title = "Evaluating Solutions for the Rapid Development of State-of-the-Art {POS} Taggers for {P}ortuguese",
    author = "Branco, Ant{\'o}nio  and
      Silva, Jo{\~a}o",
    booktitle = "Proceedings of the Fourth International Conference on Language Resources and Evaluation ({LREC}{'}04)",
    month = may,
    year = "2004",
    address = "Lisbon, Portugal",
    publisher = "European Language Resources Association (ELRA)",
    url = "",

The following authors encode prepositional articles in the lexicon, i.e., these forms are not split in tokenization:

ALENCAR, Leonel Figueiredo de; SCHWARZE, Christoph. French de and en as expressions of the genitive case: a unified analysis within LFG and computational implementation in XLE. D.E.L.T.A., 37-1, 2021 (1-49).

FRANK, A. Eine LFG-Grammatik des Französischen. In: BERMAN, J.; FRANK, A. Deutsche und französische Syntax im Formalismus der LFG. Tübingen: Niemeyer, 1996. p.97-244.

SCHWARZE, C.; ALENCAR, L. F. de. Lexikalisch-funktionale Grammatik: eine Einführung am Beispiel des Französischen mit computerlinguistischer Implementierung. Tübingen: Stauffenburg, 2016.

Prepositions em and de also contract with demonstrative pronouns and demonstrative adverbs: deste, neste, daqui etc. Preposition para contracts with the article in colloquial speech: pros (para os) etc. @arademaker , if we focus on parsing, I would split these elements in our grammar. If we don't split them, profound changes must be done by hand on the syntax. It's an intellectual challenge, it might be interesting to face it. But does it pay off? @danflick?