delph-in/docs

test suites

Opened this issue · 5 comments

This topic was already partially discussed in
https://delphinqa.ling.washington.edu/t/matrix-mrs-test-suite/484

We have in the wiki some pages related to the MrsTestSuite ...TestSuite: a discussion page and pages that are actually data (translations) I believe they could be moved to a better place. The English version of MRS test suite is also in the ERG repository http://svn.delph-in.net/erg/trunk/tsdb/gold/. Actually in the Erg repo we have both the MRS and the CSLI that @oepen explained the different to me:

the MRS test suite is something that ann and dan cooked up over the course of five or so weeks while dan was visiting cambridge, 2001 or 2002, i would say. except for some reuse of Abrams and Browne, I doubt there is any overlap in actual sentences with what was originally called the HP test suite. the latter was created to explore variation in syntactic structures and lives on in the DELPH-IN universe under the name CSLI test suite (since around 1994). the MRS test suite, on the other hand, exemplifies basic semantic constructions. so, in my view it is misleading to say it was derived from the HP data, but dan was of course centrally involved in both efforts.

(hope that @oepen is fine with my quote above)

This is also related to LR-POR/PorGram#5 where we start to use/care about these test suites for the development of the Portuguese (Brazilian) grammar.

  1. I wonder if we can better organize this data somehow. Maybe creating a separated repository to hold all the versions/translations instead of having them pages in the wiki.
  2. The discussion page suggested the use of https://github.com/xigt/xigt for moving from simple text files to something more informative that allow more annotations.
  3. How about other grammars? Do they also incorpore this data in their repositories? Using profiles?

The note about the numbers in the end of https://github.com/delph-in/docs/wiki/MatrixMrsTestSuite is related to only the table in this same page? This page is confusing because it contains a table with EN and JA translations but JA sentences are also in https://github.com/delph-in/docs/wiki/MatrixMrsTestSuiteJa and this is the only pages that seems to use the suggested schema for the sentence numbers.

The experiments with Tatoeba (https://github.com/delph-in/docs/wiki/MatrixMrsTestSuiteTatoeba) were not very conclusive too... what we can do about it?

Do we want a consolidation of these pages?

Not sure if I got your point... I was thinking on how to move these data out of the wiki in a more structural format, maybe into their own repo. But we also have ERG with the monolingual version inside its own repo...

Updates about the Tatoeba website, the sentences are surviving, and now, many more translations are available:

https://tatoeba.org/en/sentences_lists/show/166576/und/und