kentonl/e2e-coref

convert conll dataset format

Opened this issue · 2 comments

Hi
I really appreciate if you could assist me with this quesiton, I would like to convert the conll dataset format to NLI dataset format, in whcih one has one sentence, and replace the pronoun with each of the two antecedent, and then the correct one is entailment label and incorrect one is contradiction. I have two questions:

  • which information in the conll dataset your code uses? Do you also use cluster information and speaker id? I am really confused by all of these extra information and not sure if this is a part of your method.
  • I really appreciate to tell me how I can convert the conll dataset to the NLI format, is there any codes for this?
  • if one train the conll dataset like NLI with BERT model, do you think the performance could possibly suffer? I am wondering which extra information your code uses and if they have an impact?
    thanks.

The first question I can anwser you , It uses clusters, speakers, genres as features, but the speakers and genres is not necessary.

The second question maybe solved by https://zhuanlan.zhihu.com/p/121786025