http://mrp.nlpl.eu/
/net/work/projects/mrptask
Download training data from LDC:
- https://catalog.ldc.upenn.edu/organization/downloads
- Log-in as Dan. He is the contact point for Charles University, and only he can download the data.
- Update of UCCA training data can be downloaded directly from the shared task website.
- UDPipe-analyzed training data available as Companion Data.
- Ask organizers for the companion data we want to use:
- Universal Dependencies English treebanks (EWT, GUM, ParTUT, PUD, LinES) release 2.4
- Penn Treebank 3 constituent annotation
- PropBank/NomBank
- CzEngVallex
- Prague Czech-English Dependency Treebank 2.0 (at least the English part, including tectogrammatical trees)
- Raw texts and word embeddings from the CoNLL 2017 shared task
- FasText pre-trained word vectors
- BERT pre-trained models
- Collect the initial data on ÚFAL network:
/net/work/projects/mrptask
(Dan). - Look for suitable implementations of graph parsers (Kira).
- Look for possibilities of "UD enhancement" (Dan).
Friday April 5 afternoon
- Stanford CoreNLP includes a converter from the Penn Treebank constituents to UD v1
- Context-sensitive mapping of PTB POS tags to Universal POS tags