ropensci/rcrossref

cr_cn fails with some valid DOIs

bobmuscarella opened this issue · 4 comments

rcrossref is returning errors with some valid DOIs (a sample below). These are valid, as confirmed on doi.org. Any ideas what is going on or how to fix?

Please note that I am using the most recent dev version of rcrossref and I have added my email to the R.environment as per instruction on the rcrossref Github page.

Thanks for any help!

Session Info
> library(rcrossref)
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rcrossref_1.1.0.99

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7        plyr_1.8.6        compiler_4.0.3    pillar_1.6.1     
 [5] later_1.2.0       remotes_2.4.0     tools_4.0.3       digest_0.6.27    
 [9] jsonlite_1.7.2    lifecycle_1.0.0   tibble_3.1.2      pkgconfig_2.0.3  
[13] rlang_0.4.11      shiny_1.6.0       DBI_1.1.1         crul_1.1.0       
[17] curl_4.3.1        fastmap_1.1.0     xml2_1.3.2        stringr_1.4.0    
[21] dplyr_1.0.6       generics_0.1.0    vctrs_0.3.8       htmlwidgets_1.5.3
[25] DT_0.18           tidyselect_1.1.1  glue_1.4.2        httpcode_0.3.0   
[29] R6_2.5.1          fansi_0.5.0       purrr_0.3.4       magrittr_2.0.1   
[33] promises_1.2.0.1  ellipsis_0.3.2    htmltools_0.5.1.1 assertthat_0.2.1 
[37] mime_0.10         xtable_1.8-4      httpuv_1.6.1      utf8_1.2.1       
[41] stringi_1.6.2     miniUI_0.1.1.1    crayon_1.4.1     
> cr_cn("10.1111/ddi.13378", "text")
Error in nchar(hh) : invalid multibyte string, element 1
> cr_cn("10.1111/btp.12905", "text")
Error in nchar(hh) : invalid multibyte string, element 1
> cr_cn('10.1038/s41597-020-00788-5')
Error in nchar(hh) : invalid multibyte string, element 1
> cr_cn("10.1111/geb.13346")
Warning message:
v1/works/10.1111/geb.13346/transform w/ (500) - 

Thank you for raising this issue @bobmuscarella It seems Crossref API does not encode responses to UTF-8. I will alert Crossref about it. The issue relates to #221

Asked Crossref team about header encoding: https://gitlab.com/crossref/issues/-/issues/1574

@njahn82 - any update on this error?

Hi @doomlab Crossref has not fixed the issue yet, but there has been an update on how crul, rcrossref's underlying http client, deals with header encodings (see here). Good news, if you update to the most recent crul version on CRAN (1.2.0), at least the first three examples work; cr_cn("10.1111/geb.13346") returns an internal server error.

library(rcrossref)
cr_cn("10.1111/ddi.13378", "text")
#> [1] "Pouteau, R., Biurrun, I., Brunel, C., Chytrý, M., Dawson, W., Essl, F., Fristoe, T., Haveman, R., Hobohm, C., Jansen, F., Kreft, H., Lenoir, J., Lenzner, B., Meyer, C., Moeslund, J. E., Pergl, J., Pyšek, P., Svenning, J., Thuiller, W., … van Kleunen, M. (2021). Potential alien ranges of European plants will shrink in the future, but less so for already naturalized than for not yet naturalized species. Diversity and Distributions, 27(11), 2063–2076. Portico. https://doi.org/10.1111/ddi.13378"

cr_cn("10.1111/btp.12905", "text")
#> [1] "Rech, A. R., Ollerton, J., Dalsgaard, B., Ré Jorge, L., Sandel, B., Svenning, J., Baronio, G. J., & Sazima, M. (2021). Population‐level plant pollination mode is influenced by Quaternary climate and pollinators. Biotropica, 53(2), 632–642. Portico. https://doi.org/10.1111/btp.12905"

cr_cn('10.1038/s41597-020-00788-5')
#> [1] "@article{Lundgren_2021,\n\tdoi = {10.1038/s41597-020-00788-5},\n\turl = {https://doi.org/10.1038%2Fs41597-020-00788-5},\n\tyear = 2021,\n\tmonth = {jan},\n\tpublisher = {Springer Science and Business Media {LLC}},\n\tvolume = {8},\n\tnumber = {1},\n\tauthor = {Erick J. Lundgren and Simon D. Schowanek and John Rowan and Owen Middleton and Rasmus {\\O}. Pedersen and Arian D. Wallach and Daniel Ramp and Matt Davis and Christopher J. Sandom and Jens-Christian Svenning},\n\ttitle = {Functional traits of the world's late Quaternary large-bodied avian and mammalian herbivores},\n\tjournal = {Scientific Data}\n}"

cr_cn('10.1111/geb.13346')
#> Warning: v1/works/10.1111/geb.13346/transform w/ (500) -

Created on 2022-02-20 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.2 (2021-11-01)
#>  os       macOS Big Sur 11.4          
#>  system   aarch64, darwin20           
#>  ui       X11                         
#>  language en                          
#>  collate  de_DE.UTF-8                 
#>  ctype    de_DE.UTF-8                 
#>  tz       Europe/Copenhagen           
#>  date     2022-02-20                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version  date       lib source                             
#>  assertthat    0.2.1    2019-03-21 [1] CRAN (R 4.1.0)                     
#>  backports     1.2.1    2020-12-09 [1] CRAN (R 4.1.0)                     
#>  cli           3.1.0    2021-10-27 [1] CRAN (R 4.1.1)                     
#>  crayon        1.4.2    2021-10-29 [1] CRAN (R 4.1.1)                     
#>  crul          1.2.0    2021-11-22 [1] CRAN (R 4.1.1)                     
#>  curl          4.3.2    2021-06-23 [1] CRAN (R 4.1.0)                     
#>  DBI           1.1.1    2021-01-15 [1] CRAN (R 4.1.0)                     
#>  digest        0.6.28   2021-09-23 [1] CRAN (R 4.1.1)                     
#>  dplyr         1.0.7    2021-06-18 [1] CRAN (R 4.1.0)                     
#>  DT            0.19     2021-09-02 [1] CRAN (R 4.1.1)                     
#>  ellipsis      0.3.2    2021-04-29 [1] CRAN (R 4.1.0)                     
#>  evaluate      0.14     2019-05-28 [1] CRAN (R 4.1.0)                     
#>  fansi         0.5.0    2021-05-25 [1] CRAN (R 4.1.0)                     
#>  fastmap       1.1.0    2021-01-25 [1] CRAN (R 4.1.0)                     
#>  fs            1.5.0    2020-07-31 [1] CRAN (R 4.1.0)                     
#>  generics      0.1.1    2021-10-25 [1] CRAN (R 4.1.1)                     
#>  glue          1.4.2    2020-08-27 [1] CRAN (R 4.1.0)                     
#>  highr         0.9      2021-04-16 [1] CRAN (R 4.1.0)                     
#>  htmltools     0.5.2    2021-08-25 [1] CRAN (R 4.1.1)                     
#>  htmlwidgets   1.5.4    2021-09-08 [1] CRAN (R 4.1.1)                     
#>  httpcode      0.3.0    2020-04-10 [1] CRAN (R 4.1.0)                     
#>  httpuv        1.6.3    2021-09-09 [1] CRAN (R 4.1.1)                     
#>  jsonlite      1.7.2    2020-12-09 [1] CRAN (R 4.1.0)                     
#>  knitr         1.37     2021-12-16 [1] CRAN (R 4.1.1)                     
#>  later         1.3.0    2021-08-18 [1] CRAN (R 4.1.1)                     
#>  lifecycle     1.0.1    2021-09-24 [1] CRAN (R 4.1.1)                     
#>  magrittr      2.0.1    2020-11-17 [1] CRAN (R 4.1.0)                     
#>  mime          0.12     2021-09-28 [1] CRAN (R 4.1.1)                     
#>  miniUI        0.1.1.1  2018-05-18 [1] CRAN (R 4.1.0)                     
#>  pillar        1.6.4    2021-10-18 [1] CRAN (R 4.1.0)                     
#>  pkgconfig     2.0.3    2019-09-22 [1] CRAN (R 4.1.0)                     
#>  plyr          1.8.6    2020-03-03 [1] CRAN (R 4.1.0)                     
#>  promises      1.2.0.1  2021-02-11 [1] CRAN (R 4.1.0)                     
#>  purrr         0.3.4    2020-04-17 [1] CRAN (R 4.1.0)                     
#>  R6            2.5.1    2021-08-19 [1] CRAN (R 4.1.1)                     
#>  Rcpp          1.0.7    2021-07-07 [1] CRAN (R 4.1.0)                     
#>  rcrossref   * 1.1.0.99 2021-10-16 [1] Github (ropensci/rcrossref@319f34c)
#>  reprex        2.0.0    2021-04-02 [1] CRAN (R 4.1.0)                     
#>  rlang         0.4.12   2021-10-18 [1] CRAN (R 4.1.0)                     
#>  rmarkdown     2.11     2021-09-14 [1] CRAN (R 4.1.1)                     
#>  sessioninfo   1.1.1    2018-11-05 [1] CRAN (R 4.1.0)                     
#>  shiny         1.7.1    2021-10-02 [1] CRAN (R 4.1.1)                     
#>  stringi       1.7.5    2021-10-04 [1] CRAN (R 4.1.1)                     
#>  stringr       1.4.0    2019-02-10 [1] CRAN (R 4.1.0)                     
#>  styler        1.5.1    2021-07-13 [1] CRAN (R 4.1.0)                     
#>  tibble        3.1.5    2021-09-30 [1] CRAN (R 4.1.1)                     
#>  tidyselect    1.1.1    2021-04-30 [1] CRAN (R 4.1.0)                     
#>  triebeard     0.3.0    2016-08-04 [1] CRAN (R 4.1.0)                     
#>  urltools      1.7.3    2019-04-14 [1] CRAN (R 4.1.0)                     
#>  utf8          1.2.2    2021-07-24 [1] CRAN (R 4.1.0)                     
#>  vctrs         0.3.8    2021-04-29 [1] CRAN (R 4.1.0)                     
#>  withr         2.4.3    2021-11-30 [1] CRAN (R 4.1.1)                     
#>  xfun          0.29     2021-12-14 [1] CRAN (R 4.1.1)                     
#>  xml2          1.3.2    2020-04-23 [1] CRAN (R 4.1.0)                     
#>  xtable        1.8-4    2019-04-21 [1] CRAN (R 4.1.0)                     
#>  yaml          2.2.1    2020-02-01 [1] CRAN (R 4.1.0)                     
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library