ropensci/rcrossref

cr_works has missing entries in result when deep_paging vs regular search

aroranipun opened this issue · 3 comments

f1<-cr_works(query = "human agency",cursor_max =100,cursor = "*")
f<-cr_works(query = "human agency",limit = 100)

c1= names(f1$data)
c2= names(f$data)
 
c2[which(! c2 %in% c1)]
[1] "isbn"          "abstract"      "update.policy" "assertion"     "subject"       "subtitle" 

Session information

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rcrossref_1.1.0

please include session info as requested: output of devtools::session_info() or sessionInfo()

Just added session information.
PS: Also, many thanks for solving the problem in the previous issue.

Thanks for that. of course.

If we look at an example where we can quickly collect all results, both regular and deep paging get the same results, but just in different order

res1 <- cr_works(query = "ecology",
      flq = c(query.author = 'Smith', query.bibliographic = 'avian'), limit = 50)
res2 <- cr_works(query = "ecology",
      flq = c(query.author = 'Smith', query.bibliographic = 'avian'), cursor = "*")
all(sort(res1$data$doi) %in% sort(res2$data$doi))
#> TRUE
c1 = names(res1$data)
c2 = names(res2$data)
c2[which(!c2 %in% c1)]
#> character(0)

So I think the discrepancy you're seeing is deep paging extracting results using different code on their servers than what is used for regular search. It's not ideal, but I think that is what's going on. We're as far as I know not using different parsing code in this package.