cisnlp/simalign

Question on en-hi test set

Closed this issue · 3 comments

Hi, congratulations on your paper!

I am working on word alignment between en and hi. I found there are two en-hi test sets provided by this link, i.e., en-hi.wa, en-hi.wa.nonnullalign. Which test set is used in the paper?

My test results on en-hi (using subword embeddings):

en-hi.wa.nonnullalign:

XLM-R Argmax prec=85.62 rec=46.91 f1=60.61 AER=39.39
XLM-R IterMax prec=75.36 rec=51.88 f1=61.45 AER=38.55

en-hi.wa:

XLM-R Argmax prec=85.62 rec=36.32 f1=51.00 AER=49.00

The reported results in paper is:

XLM-R Argmax f1=60 AER=40

So, do you use en-hi.wa.nonnullalign as the test set?

Hi,
Thank you.

Yes, we use the "en-hi.wa.nonnullalign", since we don't generate null alignments in the output (We just skip them).
If you look at Table 5, in the supplementaries, the XLM-R Itermax is also reported there.

Thank your so much!

BTW, I cannot find the other word alignment test sets except en-hi & en-de. Could you share the test sets or send me a copy?

Links will be in the the camera ready version. I also added them to the Readme.