ropensci/rcrossref

id_converter returns live == false irronously

charliejhadley opened this issue · 10 comments

There are many instances where id_converter fails to convert PMIDs to DOI. In all instances I've found so far, id_converter(paper_doi, "doi") will result in id_converter(paper_doi, "doi")$records$live == "false"

Here's a minimal example

library("rcrossref")
library("tidyverse")
#> Warning: package 'tibble' was built under R version 3.5.2
paper_title <- "Comparison of haematology and biochemistry parameters in healthy South African infants with laboratory reference intervals"
paper_doi <- "10.1111/tmi.13009"
paper_pmid <- "9140587"

lookup_doi <- cr_works(query = paper_title)$data %>%
  slice(1) %>%
  select(doi) %>%
  .[[1]]

paper_doi == lookup_doi
#> [1] TRUE

id_converter(paper_doi, "doi")$records$live
#> [1] "false"

id_converter(paper_pmid, "pmid")$records$live
#> [1] "false"
Session Info
> devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 os       macOS  10.14                
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_GB.UTF-8                 
 ctype    en_GB.UTF-8                 
 tz       Europe/London               
 date     2019-01-30Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source                               
 assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.5.0)                       
 backports     1.1.3   2018-12-14 [1] CRAN (R 3.5.0)                       
 bibtex        0.4.2   2017-06-30 [1] CRAN (R 3.5.0)                       
 bindr         0.1.1   2018-03-13 [1] CRAN (R 3.5.0)                       
 bindrcpp    * 0.2.2   2018-03-29 [1] CRAN (R 3.5.0)                       
 blogdown      0.10    2019-01-09 [1] CRAN (R 3.5.2)                       
 bookdown      0.9     2018-12-21 [1] CRAN (R 3.5.0)                       
 broom         0.5.1   2018-12-05 [1] CRAN (R 3.5.1)                       
 callr         3.1.1   2018-12-21 [1] CRAN (R 3.5.0)                       
 cellranger    1.1.0   2016-07-27 [1] CRAN (R 3.5.0)                       
 cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.0)                       
 clipr         0.5.0   2019-01-11 [1] CRAN (R 3.5.1)                       
 colorspace    1.3-2   2016-12-14 [1] CRAN (R 3.5.0)                       
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.0)                       
 crul          0.7.0   2019-01-04 [1] CRAN (R 3.5.2)                       
 curl          3.2     2018-03-28 [1] CRAN (R 3.5.0)                       
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.5.0)                       
 devtools      2.0.1   2018-10-26 [1] CRAN (R 3.5.1)                       
 digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.0)                       
 dplyr       * 0.7.8   2018-11-10 [1] CRAN (R 3.5.0)                       
 DT            0.5     2018-11-05 [1] CRAN (R 3.5.0)                       
 evaluate      0.12    2018-10-09 [1] CRAN (R 3.5.0)                       
 fansi         0.4.0   2018-10-05 [1] CRAN (R 3.5.0)                       
 forcats     * 0.3.0   2018-02-19 [1] CRAN (R 3.5.0)                       
 fs            1.2.6   2018-08-23 [1] CRAN (R 3.5.0)                       
 generics      0.0.2   2018-11-29 [1] CRAN (R 3.5.0)                       
 ggplot2     * 3.1.0   2018-10-25 [1] CRAN (R 3.5.0)                       
 glue        * 1.3.0   2018-07-17 [1] CRAN (R 3.5.0)                       
 gtable        0.2.0   2016-02-26 [1] CRAN (R 3.5.0)                       
 haven         2.0.0   2018-11-22 [1] CRAN (R 3.5.0)                       
 here        * 0.1     2017-05-28 [1] CRAN (R 3.5.0)                       
 hms           0.4.2   2018-03-10 [1] CRAN (R 3.5.0)                       
 htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.5.0)                       
 htmlwidgets   1.3     2018-09-30 [1] CRAN (R 3.5.0)                       
 httpcode      0.2.0   2016-11-14 [1] CRAN (R 3.5.0)                       
 httpuv        1.4.5.1 2018-12-18 [1] CRAN (R 3.5.0)                       
 httr          1.4.0   2018-12-11 [1] CRAN (R 3.5.0)                       
 jsonlite      1.6     2018-12-07 [1] CRAN (R 3.5.0)                       
 knitr         1.21    2018-12-10 [1] CRAN (R 3.5.1)                       
 labeling      0.3     2014-08-23 [1] CRAN (R 3.5.0)                       
 later         0.7.5   2018-09-18 [1] CRAN (R 3.5.0)                       
 lattice       0.20-38 2018-11-04 [1] CRAN (R 3.5.0)                       
 lazyeval      0.2.1   2017-10-29 [1] CRAN (R 3.5.0)                       
 lubridate     1.7.4   2018-04-11 [1] CRAN (R 3.5.0)                       
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.0)                       
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.5.0)                       
 mime          0.6     2018-10-05 [1] CRAN (R 3.5.0)                       
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 3.5.0)                       
 modelr        0.1.2   2018-05-11 [1] CRAN (R 3.5.0)                       
 munsell       0.5.0   2018-06-12 [1] CRAN (R 3.5.0)                       
 nlme          3.1-137 2018-04-07 [1] CRAN (R 3.5.1)                       
 pillar        1.3.1   2018-12-15 [1] CRAN (R 3.5.0)                       
 pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.5.0)                       
 pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.5.0)                       
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.5.0)                       
 plyr          1.8.4   2016-06-08 [1] CRAN (R 3.5.0)                       
 prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.5.0)                       
 processx      3.2.1   2018-12-05 [1] CRAN (R 3.5.0)                       
 promises      1.0.1   2018-04-13 [1] CRAN (R 3.5.0)                       
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.5.0)                       
 purrr       * 0.2.5   2018-05-29 [1] CRAN (R 3.5.0)                       
 R6            2.3.0   2018-10-04 [1] CRAN (R 3.5.0)                       
 Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.5.0)                       
 rcrossref   * 0.8.4   2018-08-06 [1] CRAN (R 3.5.0)                       
 readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.5.0)                       
 readxl      * 1.2.0   2018-12-19 [1] CRAN (R 3.5.0)                       
 regexplain    0.2.2   2018-11-02 [1] Github (gadenbuie/regexplain@5da8d87)
 remotes       2.0.2   2018-10-30 [1] CRAN (R 3.5.1)                       
 reprex        0.2.1   2018-09-16 [1] CRAN (R 3.5.0)                       
 rlang         0.3.1   2019-01-08 [1] CRAN (R 3.5.2)                       
 rmarkdown     1.11    2018-12-08 [1] CRAN (R 3.5.0)                       
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.5.0)                       
 rsconnect     0.8.13  2019-01-10 [1] CRAN (R 3.5.1)                       
 rstudioapi    0.9.0   2019-01-09 [1] CRAN (R 3.5.2)                       
 rvest       * 0.3.2   2016-06-17 [1] CRAN (R 3.5.0)                       
 scales        1.0.0   2018-08-09 [1] CRAN (R 3.5.0)                       
 selectr       0.4-1   2018-04-06 [1] CRAN (R 3.5.0)                       
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.0)                       
 shiny       * 1.2.0   2018-11-02 [1] CRAN (R 3.5.0)                       
 stringi       1.2.4   2018-07-20 [1] CRAN (R 3.5.0)                       
 stringr     * 1.3.1   2018-05-10 [1] CRAN (R 3.5.0)                       
 styler        1.1.0   2018-11-20 [1] CRAN (R 3.5.1)                       
 testthat      2.0.1   2018-10-13 [1] CRAN (R 3.5.0)                       
 tibble      * 2.0.0   2019-01-04 [1] CRAN (R 3.5.2)                       
 tidyr       * 0.8.2   2018-10-28 [1] CRAN (R 3.5.0)                       
 tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.5.0)                       
 tidyverse   * 1.2.1   2017-11-14 [1] CRAN (R 3.5.0)                       
 triebeard     0.3.0   2016-08-04 [1] CRAN (R 3.5.0)                       
 urltools      1.7.1   2018-08-03 [1] CRAN (R 3.5.0)                       
 usethis       1.4.0   2018-08-14 [1] CRAN (R 3.5.1)                       
 utf8          1.1.4   2018-05-24 [1] CRAN (R 3.5.0)                       
 whisker       0.3-2   2013-04-28 [1] CRAN (R 3.5.0)                       
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.0)                       
 xfun          0.4     2018-10-23 [1] CRAN (R 3.5.0)                       
 xml2        * 1.2.0   2018-01-24 [1] CRAN (R 3.5.0)                       
 xtable        1.8-3   2018-08-29 [1] CRAN (R 3.5.0)                       
 yaml          2.2.0   2018-07-25 [1] CRAN (R 3.5.0) 

thanks for this @martinjhnhadley having a look

Do you have more egs that give the same error or different?

That particular DOI was failing to resolve if you did https://doi.org/10.1111/tmi.13009, but then later I tried again and now it resolves to https://onlinelibrary.wiley.com/doi/full/10.1111/tmi.13009

BUT for some reason the converter service still gives the same result you got.

the docs page https://www.ncbi.nlm.nih.gov/pmc/tools/id-converter-api/ doesn't give any info on what the live field means, do you have any knowledge of that?

I realized that it accepts more than 1 id, so I'll fix that in the fxn.

Hi @sckott! Here are some more PMIDs that fail to convert... and my current flimsy way of filtering them out.

library("purrr")
library("rcrossref")
unfriendly_pmids <- c(
  "30446726", "30064668", "29140587", "27667476", "27527814",
  "26786653", "30345709", "29425396", "28844749", "28706025", "28679028",
  "27984172", "27091321", "30252031"
)

pmid_to_doi <- function(pmid) {
  results <- id_converter(pmid, type = "pmid")

  if ("status" %in% names(results$records)) {
    NA
  } else {
    results$records$doi
  }
}

unfriendly_pmids %>%
  map_chr(pmid_to_doi)

thanks, i'll take a look at those

Been poking around many data sources, and the only thing that makes sense is that some PMIDs are just not available yet in any machine readable state.

e.g. one of the PMIDs for this https://www.ncbi.nlm.nih.gov/pubmed/30446726 has

Epub ahead of print

screen shot 2019-01-31 at 4 57 22 pm

Which makes me think it will be available later.

BUT another example https://www.ncbi.nlm.nih.gov/pubmed/29140587 has been around a while and just isn't found either.

So multiple reasons a PMID is not found perhaps.

there's maybe another option, found in my to do list that Wikimedia has an API for getting citation data they have on their many pages. you can try this new pkg:

remotes::install_github("ropenscilabs/rcitoid")
unfriendly_pmids <- c(
  "30446726", "30064668", "29140587", "27667476", "27527814",
  "26786653", "30345709", "29425396", "28844749", "28706025", "28679028",
  "27984172", "27091321", "30252031"
)
res <- lapply(unfriendly_pmids[1:5], rcitoid::cit_oid)
vapply(res, function(z) z[[1]]$DOI, "")
alapo commented

I ran into a similar problem with a list of PMIDs (attached .txt file). I was unable to get the DOI using id_converter or rcitoid::cit_oid as suggested by @sckott. My code is not the most efficient but should be easily reproducible

# this loop will use "id_converter" and create a dataframe called "tmp" to tell me which PMIDs were unable to get converted.
df <- read.delim("GitHub.txt") # read the file I uploaded and save it as "df"
Results_id_converter <- data.frame(PMID=character(0), DOI=character(0)) # create a blank dataframe that I will use below

for (i in 1:length(df$PMID) ){
  results <- id_converter(df$PMID[i], type = "pmid")
  if ("status" %in% names(results$records)) {
    result <- ("Bad")
  } else {
    results$records$doi
    result <- as.character(results$records$doi)
  }
  tm1 <- data.frame(PMID = as.character(df$PMID[i]), DOI = result) #binding the results of one iteration of the loop
  Results_id_converter <- rbind(Results_id_converter , tm1 ) #saving the result in a dataframe, saving the DOI if its there, if not labels the PMID as "Bad"
}

Now using rcitoid

# Method 2: rcitoid ----------
res <- lapply(df$PMID, rcitoid::cit_oid) # this works but is very slow with my dataset
Results_rcitoid <- as.data.frame(matrix(nrow=length(df$PMID), ncol =2))
names(Results_rcitoid) <-  c("PMID", "DOI")
Results_rcitoid$PMID <-  df$PMID

for(i in 1:length(res)){
  if ("DOI" %in% names(res[[i]][[1]]) == FALSE){  #if the DOI field is not present 
    Results_rcitoid$DOI[i] <- "Bad"
  } else {
    Results_rcitoid$DOI[i] <- res[[i]][[1]]$DOI #Otherwise add the DOI
  }
}

Now I combine the results I obtained into one dataframe called combinedResults

# Combine the results I obtained from id_converter and rcitoid ----------
Results_id_converter <- data.frame(lapply(Results_id_converter, as.character), stringsAsFactors=FALSE) #switch to characters to match Results_rcitoid
names(Results_rcitoid) <-  c("PMID", "DOI_rcitoid") #rename the column so I can merge them
combinedResults <- cbind(Results_id_converter,Results_rcitoid$DOI_rcitoid) # make a table with the combined results
names(combinedResults) = c("PMID", "DOI", "DOI_rcitoid")
#gives me all the unfriendly PMIDs
unfriendly_pmids <- subset(combinedResults, DOI == "Bad" & DOI_rcitoid == "Bad") 

When I inspect the 172nd element in res I see that the DOI element is missing. Other elements that results in errors were 227,234,321,363,364,365,368,369,370,376,377,378.

img

Now I check to see if this PMID can be fixed using id_converter instead of rcitoid. I note that the PMID is 25669007 (Article can be seen here ) . Other elements that were bad were PMID

id_converter("25669007","pmid") # this is an example of one of the bad PMIDs

Below is the console output

$status
[1] "ok"

$responseDate
[1] "2019-06-11 13:49:14"

$request
[1] "tool=rcrossref;email=myrmecocystus%40gmail.com;ids=25669007;idtype=pmid;format=json"

$records
      pmid  live status             errmsg
1 25669007 false  error invalid article id

Hopefully this will be of use to someone. Thanks for the package its been of great use to me!

GitHub.txt

@alapo thanks for sharing!

note that cit_oid does accept >1 id, so. you don't have to do lapply or similar.

closing for now ... reopen if there are other questions here