citiususc/Linguakit

Co-Reference

gamallo opened this issue ยท 8 comments

A new module for solving co-reference will be integrated by Marcos Garcia

Any update on this? ๐Ÿ’ช

The prototype for co-reference identification has been implemented several months ago by Marcos Garcia. He committed himself to integrate the module in Linguakit, but he hasn't a github account yet

Module uploaded!

I don't understand how the module works, probably because I don't understand exactly what it does. I find it confusing that there is the "coref" parameter to use the module, but there seems to be also a "-coref" parameter.

I'd do this myself, but I don't know if I'm missing something:

  1. Correct the README.md: Where it says "COREF (parameter -coref)" should be "COREF (parameter coref)"
  2. Parameter "-coref" (file linguakit - line 156) should not be taken into account as a valid parameter. Neither do "coref" in that very same line, because that module identifiers are taken into account previously.

Could you also add an usage example in the Examples part of the README.md?

In a side note, maybe the module lives in the tagger subdirectory for some affinity reason, but I find that confusing too.

Thanks! I've just corrected the README and linguakit files. Also, I modified the en.txt test file in order to show how COREF works.

If you run coref on the test (./linguakit en coref test/en.txt) you will see that NPs contain an extra column with a numerical ID. Ideally, this ID should be the same in the NPs referring to the same discourse entity (Paul = Paul_Wilson (but not Mary_Wilson); Sandra = Sandra_Curtis, etc.).

The -crnec option (experimental) uses the information provided by this kind of clustering to (try to) correct wrong NEC labels.

Yes, it was just a NEC option in the first commit. Then it has moved to a real parameter.

Actually, it could be seen just as a NEC extension, or as a completely new NLP module.

Thanks for the quick response, the explanation and examples. ๐Ÿ˜€
I'm closing this again, hope that's ok.