delph-in/erg

solving scopes and the p variable type

arademaker opened this issue · 6 comments

I am using Utool to solve the resolve the scope of quantifiers from MRSs. I got an error when Utool found an MRS with a p variable. Described here.

p (the half-way mark in the alphabet between h and x) is a generalization over labels and instances

Surely this is an issue for https://github.com/coli-saar/utool, but I just one to understand if this MRS makes sense. The u variables are unspecific or maybe unbound but what about the p3 below? It was mentioned in the ARG1 of _be_v_id but not ARG0 of any other predication. Is this MRS valid?

Is candidate photoactive

[ LTOP: h0
INDEX: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ]
RELS: < [ _be_v_id<0:2> LBL: h1 ARG0: e2 ARG1: p3 ARG2: x4 [ x PERS: 3 NUM: sg ] ]
 [ udef_q<3:25> LBL: h5 ARG0: x4 RSTR: h6 BODY: h7 ]
 [ compound<3:25> LBL: h8 ARG0: e9 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x4 ARG2: x10 [ x IND: + ] ]
 [ udef_q<3:12> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]
 [ _candidate_n_1<3:12> LBL: h14 ARG0: x10 ]
 [ _photoactive/NN_u_unknown<13:24> LBL: h8 ARG0: x4 ] >
HCONS: < h0 qeq h1 h6 qeq h8 h12 qeq h14 >
ICONS: < > ]

This MRS passed in all tests from https://pydelphin.readthedocs.io/en/latest/api/delphin.mrs.html, including plausibly_scopes, so it seems to confirm that this is a bug in Utool? @goodmami? @danflick

Hi, when were p-variables introduced to MRS? I assume it was after 2009, when we made the most recent changes to the MRS codec for Utool?

If p generalizes over h and x, I think supporting them in Utool would require serious changes to the way we read MRSs, which I'm not sure are feasible at this point in time. Can you avoid using them in your grammar?

Hi @alexanderkoller, thank you so much for the quick answer. For my practical case now, I can filter the MRSs with p variables before calling Utool. But I am waiting @danflick to understand better those variables and if the MRS above is acceptable. Also, that may suggest some improvements in the Pydelphin plausibly_scopes function.

If p-variables are fine and we can understand them... so maybe, for long-term, I would like to better understand your comment

serious changes to the way we read MRSs, which I'm not sure are feasible at this point in time

You probably mean you don't have resources or even motivation to change Utool code, right? Do you have any pointer for a more detailed documentation about the Utool algorithm and implementation? Maybe a thesis? I may have to read again the papers mentioned in https://www.coli.uni-saarland.de/projects/chorus/utool/page.php?id=technical, but they are very high-level!

p variables allow underspecification between nominal and clausal arguments, e.g. for "know": you can know a thing and you can know that something is true, so the ARG2 of _know_v_1 can be underspecified as p. I don't think p variables should appear unless the argument is dropped.

If you need a fully resolved scope tree, you would probably have to treat the two cases separately.

You probably mean you don't have resources or even motivation to change Utool code, right? Do you have any pointer for a more detailed documentation about the Utool algorithm and implementation? Maybe a thesis? I may have to read again the papers mentioned in https://www.coli.uni-saarland.de/projects/chorus/utool/page.php?id=technical, but they are very high-level!

Hi, yeah, I'm probably the only person who still remembers how the MRS input codec works, and I'm totally swamped with other things these days.

The basic reference about the MRS to dominance graph conversion is this paper: https://aclanthology.org/P04-1032/

The code for reading MRS with Utool is here: https://github.com/coli-saar/utool/blob/master/src/main/java/de/saar/chorus/domgraph/codec/mrs/MrsCodec.java

You will find that h and x arguments are handled very differently, and one would have to sit down and think quite carefully about how to handle p.

I don't think p variables should appear unless the argument is dropped.

indeed, it seems to be the case in the dataset I M working with. But how to male sure? I can check my data but I don’t have any guarantee if ERG will always have such behavior.