UniversalDependencies/tools

Does UD provide conversion tools?

Closed this issue · 3 comments

I am curious to compare UD models to others on a UD test set. The problem is, of course, that the others' labels are of a different tagset. Does UD provide conversion scripts to convert, for instance, the dependency labels of OntoNotes and Penn? Thanks in advance. (I am aware that conversion scripts will add noise, but I am fine with that.)

There are conversion scripts; the Hamburg Dependency Treebank for example was converted to UD with an extensive rule set (see https://www.aclweb.org/anthology/W19-8006/). Other treebanks were similarly converted. There is no general conversion tool X -> UD and, at least in our case, the rules are written to cover the phenomena found in the treebank, so the quality for out of domain data will be worse.

We have a conversion from Stanford Dependencies to Universal Dependencies for English, which optionally takes advantage of additional annotations if available, including entity types (for flat), error annotations (for reparandum, Typo=Yes) and coreference (for dislocated), among other things. The conversion is described and evaluated here:

https://www.aclweb.org/anthology/W18-4918/

We also have a conversion from Penn constituency treebanks to UDv2 as part of CoreNLP: source javadoc.