statsmaths/cleanNLP

allow use of CoreNLP regexner annotator

alistaire47 opened this issue · 1 comments

As I discovered writing this SO answer, CoreNLP's regexner tokenizer is installed as part of CoreNLP, but is unavailable though cleanNLP because it isn't included in the anno_level options to cnlp_init_corenlp.

Is it possible to either add a level for it, or allow passing a string or vector of annotators directly? The Java errors if the dependencies are insufficient are surprisingly useful, so even if people screw it up, it's pretty easy to rectify.

Another option to enable this and more functionality is to accept a parameter file à la coreNLP::initCoreNLP. Better, accept a string or list of parameters. Such an approach would conflict with the anno_level parameter, but would allow use of CoreNLP to its full potential.

Thanks for your work!

Currently, you could manually override your own custom properties by reading in the current settings:

fin <- file.path(system.file("extdata", package="cleanNLP"), "properties.rds")
prop <- readRDS(fin)

Change whatever you want and resave prop to the file fin, and then call the internal function that starts the coreNLP engine.

cleanNLP:::init_corenlp_backend()

I'm hesitant to open this up via an official function, though, as I don't want to then have to
support all possible options users my select and make sure that my Java code works correctly
with all of those.