What the difference among generation target, target approximate, target full expression and target gt?
Closed this issue · 4 comments
Hi there,
When I run into WebQSP generation, I'm confused about the difference between generation target, target approximate, target full expression and target gt.
What's the difference among the four? Which one should I pick for generation data?
Thanks.
Yiheng
In short, generation_target
is the ground truth for the generator.
I think it's some tricks I had in order to handle the multiple ground truths annotations in WebQSP. In some examples, there are more than one ground truth s-expressions, so ideally we want to generate the simplest one.
You can view target_gt
as the simplest s-expressions chosen from the list of the ground truth s-expressions using some rules. But some time the ranker may favor another s-expression other than the one I choose with rule. So the hope here is that, we dont want to generate another ground truth s-expression if the top-ranked s-expression is already the ground truth. So logic of line 44-60 is --
if top-ranked s-expression has 1.0 accuracy: use the top-ranked s-expression as the generation_target
; else: use the target_gt
picked by rules as the generation_target
target_full_expr
just stores target_gt
.
you can ignore target_approx_expr
in the generation part. It is something used in the ranker training part, you can view it as a simplified ground truth s-expression for better training the ranker (because some of the ground truth s-expressions aren't covered in the enumerate s-expressions, so we sometime want the ranker to favor a s-expression that is close to the target s-expression but within the enumerated space)
So I guess we could just use generation_target
.
But may I known, if you have tested, use multiple targets instead of only choosing one, as the label during the generator training?
Because there maybe some equivalent s-expressions when enumeration, and there also maybe multiple parsers for each question in WebQSP.
Yeah just use generation_target
.
Nope, I haven't tried that. It doesn't sound very intuitive to me if the same set of top-ranked candidates has multiple generation targets, so I didn't try that.
Got it.
Thanks for your reply.