`cnlp_annotate()` generates runtime error with coreNLP backend
joshpersi opened this issue · 0 comments
joshpersi commented
Hello,
I'm very excited to use the cleanNLP package with the coreNLP backend. I'm a first time user of both, and am running into an error when running cnlp_annotate()
with the coreNLP backend. Using other backends does not produce this error.
Here is the the script I am using to try and test cleanNLP. Is there anything obvious I'm doing wrong?
# Load required packages
library(reticulate)
library(cleanNLP)
# Install Miniconda, if required
# install_miniconda(force = TRUE)
# Ensure the Miniconda output is set appropriately. My path is:
# C:/Users/persij/AppData/Local/r-miniconda
miniconda_path()
# Install the stanfordnlp package, which is required for cnlp_download_corenlp(),
# and the cleannlp package, which is required for cnlp_init_corenlp(). Set pip
# = TRUE since these packages aren't on Conda.
conda_install(packages = c("stanfordnlp", "cleannlp"), pip = TRUE)
# Download the coreNLP model files
cnlp_download_corenlp(lang = "en")
# Produces output like the following:
# Using the default treebank "en_ewt" for language "en".
# Would you like to download the models for: en_ewt now? (Y/n)
#
# Default download directory: C:\Users\persij\stanfordnlp_resources
# Hit enter to continue or type an alternate directory.
#
# Downloading models for: en_ewt
# Download location: C:\Users\persij\stanfordnlp_resources\en_ewt_models.zip
# 100%|██████████| 235M/235M [01:58<00:00, 1.98MB/s]
#
# Download complete. Models saved to: C:\Users\persij\stanfordnlp_resources\en_ewt_models.zip
# Extracting models file for: en_ewt
# Cleaning up...Done.
# Initiate the coreNLP backend. Produces no output:
cnlp_init_corenlp()
# Fails here, generating the following output:
annotation <- cnlp_annotate(input = c(
"Here is the first text. It is short.",
"Here's the second. It is short too!",
"The third text is the shortest."
))
# Error in py_call_impl(callable, call_args$unnamed, call_args$named) :
# RuntimeError: masked_fill_ only supports boolean masks, but got mask with dtype unsigned char
# Run `reticulate::py_last_error()` for details.
Here is the output from reticulate::py_last_error()
:
> reticulate::py_last_error()
── Python Exception Message ───────────────────────────────────────────────────────────────────────────────────
Traceback (most recent call last):
File "C:\Users\persij\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\cleannlp\corenlp.py", line 50, in parseDocument
doc = self.nlp(text)
File "C:\Users\persij\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\stanfordnlp\pipeline\core.py", line 176, in __call__
self.process(doc)
File "C:\Users\persij\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\stanfordnlp\pipeline\core.py", line 170, in process
self.processors[processor_name].process(doc)
File "C:\Users\persij\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\stanfordnlp\pipeline\depparse_processor.py", line 30, in process
preds += self.trainer.predict(b)
File "C:\Users\persij\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\stanfordnlp\models\depparse\trainer.py", line 72, in predict
_, preds = self.model(word, word_mask, wordchars, wordchars_mask, upos, xpos, ufeats, pretrained, lemma, head, deprel, word_orig_idx, sentlens, wordlens)
File "C:\Users\persij\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\persij\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\persij\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\stanfordnlp\models\depparse\model.py", line 157, in forward
unlabeled_scores.masked_fill_(diag, -float('inf'))
RuntimeError: masked_fill_ only supports boolean masks, but got mask with dtype unsigned char
── R Traceback ────────────────────────────────────────────────────────────────────────────────────────────────
▆
1. └─cleanNLP::cnlp_annotate(...)
2. └─cleanNLP:::annotate_with_corenlp(input, verbose)
3. └─volatiles$corenlp$obj$parseDocument(x, doc_id)
4. └─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)
And here is session info
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] English_United States.1252
time zone: America/Vancouver
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] cleanNLP_3.0.7 reticulate_1.35.0
loaded via a namespace (and not attached):
[1] compiler_4.3.2 Matrix_1.6-5 cli_3.6.2 tools_4.3.2 yaml_2.3.8 Rcpp_1.0.12 stringi_1.8.3
[8] grid_4.3.2 jsonlite_1.8.8 rlang_1.1.3 renv_1.0.5 png_0.1-8 lattice_0.22-5