stanfordnlp/CoreNLP

Relation Extractor custom entities

aoldoni opened this issue · 8 comments

Hi All,

Thanks for the great software. I would like to ask you the following please.

When training specific relations to be extracted from custom Entity types, using the Relation Extractor, I noted that the current possible entities are "hard-coded" in some parts, e.g.:

By modifying these 2 bits, one can re-use the Relation Extractor successfully with custom entities in case its needed, but this requires then a recompilation and an initial troubleshooting as to understand this.

Would you be interested in a pull-request that refactors these hard-coded methods in something that is obtainable from the properties file? E.g.: in the properties file one can indicate a "entitiesPath" option which would then point to a tab separated file with the normalised and not normalised values of these entities as its columns.

If this option is not provided potentially these default hard coded entities can then be used as to maintain the current behaviour.

This would cause potential Relation Extractor workflows with custom entities to be possible without code recompilation.

Please advise.

Again, thanks!

J38 commented

There has been a lot of interest in making custom relation extraction training available, but I think the path forward is to make it easier to train models for new relations that work with the KBPAnnotator. I'm going to try to make sure there is clear documentation and any code changes necessary for that for Stanford CoreNLP 3.8.0.

OK, cool, thanks for the prompt response.

So the understanding is that the input format will be migrated from the Roth CONLL04 format to the KBP format for the training, and at that point this will become flexible.

At this point I have a small customisation locally to adjust this and will continue to use such method.

J38 commented

Yes that would be the plan. I'm going to start working on this and hopefully it won't take too long. By the way, if you happen to have any sample training data I could look at I am looking for an example so I can make sure my modifications are working properly.

Hi @J38 - sorry it took a while for me to respond to you.

  1. Would you have an email for me to send the data? Unfortunately I cannot link it to the internet.

  2. Moreover, I would like to point out that I modified the code slightly for me as to add the ability to parametrise NER tags for RE training in this commit.
    It adds a new parameter that can be used in the MachineReadingProperties properties file with a comma separated list of values for the NER tag entity normalization needed by the RE machine reading classes. This is an example of the end result.
    This approach surely differs from your intentions as explained above, but I would be happy to do a pull request if you believe such parametrisation is useful to have in case someone else needs such customisation in the meanwhile, before the KBPAnnotator is implemented for Relation Extractor training.

Please let me know your thoughts! 👍

Hello,

Thanks for great Stanford tools!

I would badly need to be able to train RE with custom entities for my project. I am not a professional (java) programmer (I am able to compile from source if proper instructions available) though and do not fully understand how to "change the code" as aoldoni suggested. Is the possibility of training custom relationships with custom entities available in 3.8? If not, how could I use the approach aoldoni suggested? I have available train corpus in original roth format available. Many thanks for reply! I am attaching small sample train file.

rel_train.txt

Hi @rpalenik ,

If not, how could I use the approach aoldoni suggested?

Regarding this question specifically, please note:

  1. You could use this fork: https://github.com/aoldoni/stanford-corenlp - it contains the change that I did.
  2. Re-compile it, instructions here https://github.com/stanfordnlp/CoreNLP#build-instructions
  3. Then use the new "possibleEntities" attribute in the properties file that is now available in this fork, as per this example https://github.com/aoldoni/tetre/blob/develop/config/relation.properties#L54

Hi @aoldoni ,

Many thanks, I would need some more help. I understand I need to:

  1. Clone current source file from https://github.com/stanfordnlp/CoreNLP
  2. Replace respective files from your repository
  3. Compile

However, I got numerous compilation errors. Have I done it wrong? Can you pls. help with the right approach?

thnx.
R.

Here is the output from the compiler......
ant_error.txt