
MCD2 and MCD3 specific data processing?

parasj opened this issue · 8 comments

Hi authors, @SivilTaram

I see there is some specialized logic to process the CFQ dataset for the MCD2 and MCD3 datasets. We are confused why this special path is present. Why did you add this special logic? What what the behavior if you preprocessed MCD2 and MCD3 with the MCD1 preprocessing code paths?

if query.startswith("Did M") or query.startswith("Was M") or query.startswith("Were M") or query.startswith("Was a"):
if type in ['mcd2', 'mcd3']:
nl_pattern = query.split()[0] +" " + query.split()[1]
terms.append((nl_pattern, [f'?x0#is#{query.split()[1]}'], (0, 1)))
nl_pattern = query.split()[0] +" M"
terms.append((nl_pattern, ['?x0#is#M'], (0, 1)))

if candidate_term.count("M") == 1:
if candidate_term.startswith("?x0 is M") and split in ['mcd2', 'mcd3']:
candidate_triplets[candidate_skeleton] += [candidate_term]
candidate_triplets[candidate_skeleton] += [''.join(candidate_term.replace("M", entity[0][0])) for entity in entities]


Hi @parasj ,

Thanks for attention!

Let's start from a simple example.

Natural language question: Did M0 read M1 ?

The original logical form is:


We transform it into an equivalent logic form in which each path always starts with "?x0":

?x0 is_M M0
?x0 READ M1

As you can see, we introduce a new predicate is_M to this dataset.

For MCD1:

Here we explicitly wrote 4 lexicons ("Did M", "Was M", "Were M", and "Was a") for this predicate.
This is because: the former 3 lexicons can be found by data/, but the last one (i.e., "Was a") was missed. Therefore, we manually add it here.

For MCD2/3:

We found that it can significantly improve the performance if we chose to introduce more detailed predicates is_M0, is_M1, ..., is_M6, rather than just use a simple coarse predicate is_M.

For example, in this setup, the logic form will be:

?x0 is_M0
?x0 READ M1

This setup helps reduce the search space.

It brought good performance gain on MCD2/3, while not influencing the performance on MCD1 much.

Feel free to ping me if you have any more question~

Thank you @linzeqipku!

Another note -- we are struggling to replicate the results on the splits.

For example, in the paper, the sketch prediction module achieves 73% accuracy on the MCD3 split, but struggles to consistently obtain above 50% when we attempted to retrain the model with the default hyperparameters in the code and number of training epochs. It appears the model is very sensitive to the number of epochs used during training. There is a wide variance between initializations as well.

What hyperparameters did you use for each split and how did you select those? We are currently using the default hparams used for the default 'mcd1' in the code.

This work is very exciting, and we (@parasj and @GaiYu0) are interested in extending it. Would really appreciate your tips on how we can fix the training procedure.

We used default parameters.
Seems that the output of sketch_prediction/ is not what we want.
I've checked my logs. The output of sketch_prediction/ is ~0.3 while the overall accuracy is ~065 ...

I'll double-check with Yinuo (the first author) and reply you later.

Hi parasj, I'm Yinuo, thanks for your attention!
There's a bug in the code for calculating sketch accuracy in sketch_prediction/, like this
?x0 P M . ?x0 a M
?x0 a M ### ?x0 P M

and results reported in our paper calculated by another script which is not included in this repo.
We will fix the bug as soon as possible, thanks a lot.

@gyn0806 @linzeqipku Thank you for your reply!

Another question -- I am trying to replicate the HPD test accuracy results on MCD1, MCD2 and MCD3 and see that results depend on the epoch used for testing. Which epoch's checkpoint did you use for test set evaluation for each split?

hi, parasj,
we select the model for inference by the code below, using the model which performs best on dev set

@parasj Has Yinuo solved your problem?

@SivilTaram Closing this issue -- we got the information we needed! Thank you :)