ropensci/europepmc

Error in epmc_search because out$next_cursor is NULL

Closed this issue · 11 comments

Using the following query:

query <- '("trimethadione") AND (OPEN_ACCESS:y) AND (FIRST_PDATE:[1925-01-01 TO 2021-09-30]) AND ((KW:"Toxicity Tests") OR (KW:"toxicity") OR (ABSTRACT:"toxic*") OR (ABSTRACT:"toxin*")) AND ((ABSTRACT:"development*") OR (ABSTRACT:"reproduct*") OR (KW:"Teratogens") OR (KW:"Teratogenesis") OR (KW:"Abnormalities, Drug-Induced") OR (ABSTRACT:"teratog*") OR (ABSTRACT:"congenital abnormal*") OR (ABSTRACT:"malform*") OR (ABSTRACT:"embryotoxi*") OR (ABSTRACT:"embryo test*") OR (ABSTRACT:"embryonic test*") OR (KW:"maternal exposure") OR (ABSTRACT:"maternal exposure*"))'

in the function:

literature <- europepmc::epmc_search(query, synonym = FALSE, limit = 52, verbose = FALSE)

gives the following error:

Error in if (page_token == out$next_cursor) break : 
  argument is of length zero

It seems like this specific query results in normal 'out$results', but the 'out$next_cursor' is NULL. This causes the evaluation to result in logical(0), hence the error.

Could you maybe wrap the

if (page_token == out$next_cursor) 
    break

statement in another if-statement to prevent this situation? For example:

if (!is.null(out$next_cursor)) {
    if (page_token == out$next_cursor) {
        break
    }
}

I've had the same problem. I wondered if it had to do with the limit so I tested different ones, and it seemed to only happen when I set the limit to be >> the records found. I made this change to my code, and it's worked okay so far.

query_count <-epmc_hits(query = my_query) pmc_seed <- epmc_search(query=my_query, limit = query_count)

It seems like something similar is actually already incorporated into the epmc_search function:

hits <- europepmc::epmc_hits(query, synonym = synonym)
limit <- ifelse(hits <= limit, hits, limit)

So I don't think that should make a difference...
For me it also doesn't change or remove the error, unfortunately.

Hm... the limit is default 100 otherwise, but no matter... it was just a guess. I'm sorry it's not working for you! It's a great package otherwise!

Thank you @1heidi and @ESPoppelaars for reporting this issue. I am still trying to understand what causes the error. Could you share a reprex with the error?

Here's my try:

query <- '("trimethadione") AND (OPEN_ACCESS:y) AND (FIRST_PDATE:[1925-01-01 TO 2021-09-30]) AND ((KW:"Toxicity Tests") OR (KW:"toxicity") OR (ABSTRACT:"toxic*") OR (ABSTRACT:"toxin*")) AND ((ABSTRACT:"development*") OR (ABSTRACT:"reproduct*") OR (KW:"Teratogens") OR (KW:"Teratogenesis") OR (KW:"Abnormalities, Drug-Induced") OR (ABSTRACT:"teratog*") OR (ABSTRACT:"congenital abnormal*") OR (ABSTRACT:"malform*") OR (ABSTRACT:"embryotoxi*") OR (ABSTRACT:"embryo test*") OR (ABSTRACT:"embryonic test*") OR (KW:"maternal exposure") OR (ABSTRACT:"maternal exposure*"))'


europepmc::epmc_search(query, synonym = FALSE, limit = 52)
#> 1 records found, returning 1
#> # A tibble: 1 × 27
#>   id     source pmid   pmcid doi   title authorString journalTitle journalVolume
#>   <chr>  <chr>  <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>        
#> 1 10852… MED    10852… PMC1… 10.1… Work… Adams J, Ba… Environ Hea… 108 Suppl 3  
#> # … with 18 more variables: pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> #   pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>


europepmc::epmc_search("malaria", synonym = FALSE, limit = 52)
#> 210116 records found, returning 52
#> # A tibble: 52 × 28
#>    id     source doi    title  authorString  pubYear pubType isOpenAccess inEPMC
#>    <chr>  <chr>  <chr>  <chr>  <chr>         <chr>   <chr>   <chr>        <chr> 
#>  1 PPR42… PPR    10.21… Malar… Liu H, Zhou … 2021    prepri… N            N     
#>  2 PPR42… PPR    10.21… Facto… Takarinda KP… 2021    prepri… N            N     
#>  3 PMC86… PMC    <NA>   Poten… Ataba E, Dor… 2021    review… Y            Y     
#>  4 PPR42… PPR    10.20… Featu… Mariki M, Md… 2021    prepri… N            N     
#>  5 PPR42… PPR    10.11… Getti… Oresegun DR,… 2021    prepri… N            N     
#>  6 PPR41… PPR    10.21… Pedia… Ferrao JL, M… 2021    prepri… N            N     
#>  7 PPR41… PPR    10.21… Knowl… Khan W, shah… 2021    prepri… N            N     
#>  8 PPR42… PPR    10.11… Histi… Iwasaki T, S… 2021    prepri… N            N     
#>  9 PPR42… PPR    10.21… Fores… Jongdeepaisa… 2021    prepri… N            N     
#> 10 PMC85… PMC    <NA>   Malar… Mohanan P, I… 2021    letter  Y            Y     
#> # … with 42 more rows, and 19 more variables: inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>, journalTitle <chr>,
#> #   journalIssn <chr>, pageInfo <chr>, journalVolume <chr>, pmid <chr>,
#> #   issue <chr>


europepmc::epmc_search("malaria", synonym = TRUE, limit = 1052)
#> 212861 records found, returning 1052
#> # A tibble: 1,052 × 29
#>    id         source pmid     pmcid  doi   title authorString journalTitle issue
#>    <chr>      <chr>  <chr>    <chr>  <chr> <chr> <chr>        <chr>        <chr>
#>  1 34100426   MED    34100426 PMC84… 10.4… New … Lima MN, Ba… Neural Rege… 1    
#>  2 PMC8602884 PMC    <NA>     PMC86… <NA>  Pote… Ataba E, Do… Acta Parasi… <NA> 
#>  3 PPR421448  PPR    <NA>     <NA>   10.2… Fact… Takarinda K… <NA>         <NA> 
#>  4 33341138   MED    33341138 <NA>   10.1… Trip… Wang J, Xu … Lancet       10267
#>  5 PPR421608  PPR    <NA>     <NA>   10.2… Mala… Liu H, Zhou… <NA>         <NA> 
#>  6 PPR421696  PPR    <NA>     <NA>   10.1… Gett… Oresegun DR… <NA>         <NA> 
#>  7 PPR423258  PPR    <NA>     <NA>   10.2… Inhi… Olajide O, … <NA>         <NA> 
#>  8 PPR423134  PPR    <NA>     <NA>   10.1… Hist… Iwasaki T, … <NA>         <NA> 
#>  9 PPR422651  PPR    <NA>     <NA>   10.1… Revi… Thommen BT,… <NA>         <NA> 
#> 10 PPR420867  PPR    <NA>     <NA>   10.2… Feat… Mariki M, M… <NA>         <NA> 
#> # … with 1,042 more rows, and 20 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, versionNumber <int>

Created on 2021-11-24 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       macOS Big Sur 11.4          
#>  system   aarch64, darwin20           
#>  ui       X11                         
#>  language en                          
#>  collate  de_DE.UTF-8                 
#>  ctype    de_DE.UTF-8                 
#>  tz       Europe/Copenhagen           
#>  date     2021-11-24                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source                             
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.1.0)                     
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)                     
#>  cli           3.1.0   2021-10-27 [1] CRAN (R 4.1.1)                     
#>  crayon        1.4.2   2021-10-29 [1] CRAN (R 4.1.1)                     
#>  curl          4.3.2   2021-06-23 [1] CRAN (R 4.1.0)                     
#>  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.1.0)                     
#>  digest        0.6.28  2021-09-23 [1] CRAN (R 4.1.1)                     
#>  dplyr         1.0.7   2021-06-18 [1] CRAN (R 4.1.0)                     
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)                     
#>  europepmc     0.4.1   2021-09-01 [1] Github (ropensci/europepmc@a182cc6)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)                     
#>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)                     
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)                     
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)                     
#>  generics      0.1.1   2021-10-25 [1] CRAN (R 4.1.1)                     
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)                     
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)                     
#>  hms           1.1.1   2021-09-26 [1] CRAN (R 4.1.1)                     
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.1)                     
#>  httr          1.4.2   2020-07-20 [1] CRAN (R 4.1.0)                     
#>  jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.1.0)                     
#>  knitr         1.36    2021-09-29 [1] CRAN (R 4.1.1)                     
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.1)                     
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)                     
#>  pillar        1.6.4   2021-10-18 [1] CRAN (R 4.1.0)                     
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)                     
#>  plyr          1.8.6   2020-03-03 [1] CRAN (R 4.1.0)                     
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.1.0)                     
#>  progress      1.2.2   2019-05-16 [1] CRAN (R 4.1.0)                     
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)                     
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.1)                     
#>  Rcpp          1.0.7   2021-07-07 [1] CRAN (R 4.1.0)                     
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.1.0)                     
#>  rlang         0.4.12  2021-10-18 [1] CRAN (R 4.1.0)                     
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.1)                     
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)                     
#>  stringi       1.7.5   2021-10-04 [1] CRAN (R 4.1.1)                     
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)                     
#>  styler        1.5.1   2021-07-13 [1] CRAN (R 4.1.0)                     
#>  tibble        3.1.5   2021-09-30 [1] CRAN (R 4.1.1)                     
#>  tidyr         1.1.4   2021-09-27 [1] CRAN (R 4.1.1)                     
#>  tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.1.0)                     
#>  triebeard     0.3.0   2016-08-04 [1] CRAN (R 4.1.0)                     
#>  urltools      1.7.3   2019-04-14 [1] CRAN (R 4.1.0)                     
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)                     
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)                     
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)                     
#>  xfun          0.26    2021-09-14 [1] CRAN (R 4.1.1)                     
#>  xml2          1.3.2   2020-04-23 [1] CRAN (R 4.1.0)                     
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)                     
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library
library(europepmc)
my_query <- '(((ABSTRACT:"www" OR ABSTRACT:"http" OR ABSTRACT:"https") AND (ABSTRACT:"data" OR ABSTRACT:"resource" OR ABSTRACT:"database"))  NOT (TITLE:"retraction" OR TITLE:"retracted" OR TITLE:"withdrawn" OR TITLE:"withdrawal" OR TITLE:"erratum") NOT ((ABSTRACT:"retract" OR ABSTRACT:"withdraw" ABSTRACT:"erratum" OR ABSTRACT:"github.com" OR ABSTRACT:"github.io" OR ABSTRACT:"cran.r" OR ABSTRACT:"youtube.com" OR ABSTRACT:"bitbucket.org" OR ABSTRACT:"links.lww.com" OR ABSTRACT:"osf.io" OR ABSTRACT:"bioconductor.org" OR ABSTRACT:"annualreviews.org" OR ABSTRACT:"creativecommons.org" OR ABSTRACT:"sourceforge.net" OR ABSTRACT:".pdf" OR ABSTRACT:"clinical trial" OR ABSTRACT:"registry" OR ABSTRACT:"registration" OR ABSTRACT:"trial registration" OR ABSTRACT:"clinicaltrial" OR ABSTRACT:"registration number" OR ABSTRACT:"pre-registration" OR ABSTRACT:"preregistration"))) AND (((SRC:MED OR SRC:PMC OR SRC:AGR OR SRC:CBA))) AND (FIRST_PDATE:[2011 TO 2021])'
pmc_seed <- epmc_search(query=my_query, limit = 25000)
#> 21835 records found, returning 21835
#> Error in if (page_token == out$next_cursor) break: argument is of length zero

Created on 2021-11-24 by the reprex package (v2.0.1)

@njahn82 - I haven't done reprex before - I'm hoping the above is what you're looking for? thank you!

Looks good, could you also provide me with the europepmc version packageVersion("europepmc")?

ahhh... it was 0.4 ... I thought I had updated when I saw the error but sessionInfo said differently. Just updated to europepmc_0.4.1 and no longer have the error. Very sorry @njahn82!

No worries, glad it works now!

Ah I also still had version 0.4, oops. I just updated to 0.4.1 and the error is gone for me as well. Thank you!

Great, good to hear!