id_converter returns live == false irronously
charliejhadley opened this issue · 10 comments
There are many instances where id_converter
fails to convert PMIDs to DOI. In all instances I've found so far, id_converter(paper_doi, "doi")
will result in id_converter(paper_doi, "doi")$records$live == "false"
Here's a minimal example
library("rcrossref")
library("tidyverse")
#> Warning: package 'tibble' was built under R version 3.5.2
paper_title <- "Comparison of haematology and biochemistry parameters in healthy South African infants with laboratory reference intervals"
paper_doi <- "10.1111/tmi.13009"
paper_pmid <- "9140587"
lookup_doi <- cr_works(query = paper_title)$data %>%
slice(1) %>%
select(doi) %>%
.[[1]]
paper_doi == lookup_doi
#> [1] TRUE
id_converter(paper_doi, "doi")$records$live
#> [1] "false"
id_converter(paper_pmid, "pmid")$records$live
#> [1] "false"
Session Info
> devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.5.1 (2018-07-02)
os macOS 10.14
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_GB.UTF-8
ctype en_GB.UTF-8
tz Europe/London
date 2019-01-30
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.0)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.0)
bibtex 0.4.2 2017-06-30 [1] CRAN (R 3.5.0)
bindr 0.1.1 2018-03-13 [1] CRAN (R 3.5.0)
bindrcpp * 0.2.2 2018-03-29 [1] CRAN (R 3.5.0)
blogdown 0.10 2019-01-09 [1] CRAN (R 3.5.2)
bookdown 0.9 2018-12-21 [1] CRAN (R 3.5.0)
broom 0.5.1 2018-12-05 [1] CRAN (R 3.5.1)
callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.5.0)
cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.0)
clipr 0.5.0 2019-01-11 [1] CRAN (R 3.5.1)
colorspace 1.3-2 2016-12-14 [1] CRAN (R 3.5.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
crul 0.7.0 2019-01-04 [1] CRAN (R 3.5.2)
curl 3.2 2018-03-28 [1] CRAN (R 3.5.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0)
devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.1)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.0)
dplyr * 0.7.8 2018-11-10 [1] CRAN (R 3.5.0)
DT 0.5 2018-11-05 [1] CRAN (R 3.5.0)
evaluate 0.12 2018-10-09 [1] CRAN (R 3.5.0)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.0)
forcats * 0.3.0 2018-02-19 [1] CRAN (R 3.5.0)
fs 1.2.6 2018-08-23 [1] CRAN (R 3.5.0)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.5.0)
ggplot2 * 3.1.0 2018-10-25 [1] CRAN (R 3.5.0)
glue * 1.3.0 2018-07-17 [1] CRAN (R 3.5.0)
gtable 0.2.0 2016-02-26 [1] CRAN (R 3.5.0)
haven 2.0.0 2018-11-22 [1] CRAN (R 3.5.0)
here * 0.1 2017-05-28 [1] CRAN (R 3.5.0)
hms 0.4.2 2018-03-10 [1] CRAN (R 3.5.0)
htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.0)
htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.5.0)
httpcode 0.2.0 2016-11-14 [1] CRAN (R 3.5.0)
httpuv 1.4.5.1 2018-12-18 [1] CRAN (R 3.5.0)
httr 1.4.0 2018-12-11 [1] CRAN (R 3.5.0)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.5.0)
knitr 1.21 2018-12-10 [1] CRAN (R 3.5.1)
labeling 0.3 2014-08-23 [1] CRAN (R 3.5.0)
later 0.7.5 2018-09-18 [1] CRAN (R 3.5.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.0)
lazyeval 0.2.1 2017-10-29 [1] CRAN (R 3.5.0)
lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0)
mime 0.6 2018-10-05 [1] CRAN (R 3.5.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 3.5.0)
modelr 0.1.2 2018-05-11 [1] CRAN (R 3.5.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.0)
nlme 3.1-137 2018-04-07 [1] CRAN (R 3.5.1)
pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.0)
pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.0)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.0)
plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0)
processx 3.2.1 2018-12-05 [1] CRAN (R 3.5.0)
promises 1.0.1 2018-04-13 [1] CRAN (R 3.5.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.0)
purrr * 0.2.5 2018-05-29 [1] CRAN (R 3.5.0)
R6 2.3.0 2018-10-04 [1] CRAN (R 3.5.0)
Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.0)
rcrossref * 0.8.4 2018-08-06 [1] CRAN (R 3.5.0)
readr * 1.3.1 2018-12-21 [1] CRAN (R 3.5.0)
readxl * 1.2.0 2018-12-19 [1] CRAN (R 3.5.0)
regexplain 0.2.2 2018-11-02 [1] Github (gadenbuie/regexplain@5da8d87)
remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.1)
reprex 0.2.1 2018-09-16 [1] CRAN (R 3.5.0)
rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.2)
rmarkdown 1.11 2018-12-08 [1] CRAN (R 3.5.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
rsconnect 0.8.13 2019-01-10 [1] CRAN (R 3.5.1)
rstudioapi 0.9.0 2019-01-09 [1] CRAN (R 3.5.2)
rvest * 0.3.2 2016-06-17 [1] CRAN (R 3.5.0)
scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.0)
selectr 0.4-1 2018-04-06 [1] CRAN (R 3.5.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.0)
shiny * 1.2.0 2018-11-02 [1] CRAN (R 3.5.0)
stringi 1.2.4 2018-07-20 [1] CRAN (R 3.5.0)
stringr * 1.3.1 2018-05-10 [1] CRAN (R 3.5.0)
styler 1.1.0 2018-11-20 [1] CRAN (R 3.5.1)
testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.0)
tibble * 2.0.0 2019-01-04 [1] CRAN (R 3.5.2)
tidyr * 0.8.2 2018-10-28 [1] CRAN (R 3.5.0)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.0)
tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.5.0)
triebeard 0.3.0 2016-08-04 [1] CRAN (R 3.5.0)
urltools 1.7.1 2018-08-03 [1] CRAN (R 3.5.0)
usethis 1.4.0 2018-08-14 [1] CRAN (R 3.5.1)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.5.0)
whisker 0.3-2 2013-04-28 [1] CRAN (R 3.5.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
xfun 0.4 2018-10-23 [1] CRAN (R 3.5.0)
xml2 * 1.2.0 2018-01-24 [1] CRAN (R 3.5.0)
xtable 1.8-3 2018-08-29 [1] CRAN (R 3.5.0)
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.0)
thanks for this @martinjhnhadley having a look
Do you have more egs that give the same error or different?
That particular DOI was failing to resolve if you did https://doi.org/10.1111/tmi.13009, but then later I tried again and now it resolves to https://onlinelibrary.wiley.com/doi/full/10.1111/tmi.13009
BUT for some reason the converter service still gives the same result you got.
the docs page https://www.ncbi.nlm.nih.gov/pmc/tools/id-converter-api/ doesn't give any info on what the live
field means, do you have any knowledge of that?
I realized that it accepts more than 1 id, so I'll fix that in the fxn.
Hi @sckott! Here are some more PMIDs that fail to convert... and my current flimsy way of filtering them out.
library("purrr")
library("rcrossref")
unfriendly_pmids <- c(
"30446726", "30064668", "29140587", "27667476", "27527814",
"26786653", "30345709", "29425396", "28844749", "28706025", "28679028",
"27984172", "27091321", "30252031"
)
pmid_to_doi <- function(pmid) {
results <- id_converter(pmid, type = "pmid")
if ("status" %in% names(results$records)) {
NA
} else {
results$records$doi
}
}
unfriendly_pmids %>%
map_chr(pmid_to_doi)
thanks, i'll take a look at those
Been poking around many data sources, and the only thing that makes sense is that some PMIDs are just not available yet in any machine readable state.
e.g. one of the PMIDs for this https://www.ncbi.nlm.nih.gov/pubmed/30446726 has
Epub ahead of print
Which makes me think it will be available later.
BUT another example https://www.ncbi.nlm.nih.gov/pubmed/29140587 has been around a while and just isn't found either.
So multiple reasons a PMID is not found perhaps.
there's maybe another option, found in my to do list that Wikimedia has an API for getting citation data they have on their many pages. you can try this new pkg:
remotes::install_github("ropenscilabs/rcitoid")
unfriendly_pmids <- c(
"30446726", "30064668", "29140587", "27667476", "27527814",
"26786653", "30345709", "29425396", "28844749", "28706025", "28679028",
"27984172", "27091321", "30252031"
)
res <- lapply(unfriendly_pmids[1:5], rcitoid::cit_oid)
vapply(res, function(z) z[[1]]$DOI, "")
I ran into a similar problem with a list of PMIDs (attached .txt
file). I was unable to get the DOI using id_converter
or rcitoid::cit_oid
as suggested by @sckott. My code is not the most efficient but should be easily reproducible
# this loop will use "id_converter" and create a dataframe called "tmp" to tell me which PMIDs were unable to get converted.
df <- read.delim("GitHub.txt") # read the file I uploaded and save it as "df"
Results_id_converter <- data.frame(PMID=character(0), DOI=character(0)) # create a blank dataframe that I will use below
for (i in 1:length(df$PMID) ){
results <- id_converter(df$PMID[i], type = "pmid")
if ("status" %in% names(results$records)) {
result <- ("Bad")
} else {
results$records$doi
result <- as.character(results$records$doi)
}
tm1 <- data.frame(PMID = as.character(df$PMID[i]), DOI = result) #binding the results of one iteration of the loop
Results_id_converter <- rbind(Results_id_converter , tm1 ) #saving the result in a dataframe, saving the DOI if its there, if not labels the PMID as "Bad"
}
Now using rcitoid
# Method 2: rcitoid ----------
res <- lapply(df$PMID, rcitoid::cit_oid) # this works but is very slow with my dataset
Results_rcitoid <- as.data.frame(matrix(nrow=length(df$PMID), ncol =2))
names(Results_rcitoid) <- c("PMID", "DOI")
Results_rcitoid$PMID <- df$PMID
for(i in 1:length(res)){
if ("DOI" %in% names(res[[i]][[1]]) == FALSE){ #if the DOI field is not present
Results_rcitoid$DOI[i] <- "Bad"
} else {
Results_rcitoid$DOI[i] <- res[[i]][[1]]$DOI #Otherwise add the DOI
}
}
Now I combine the results I obtained into one dataframe called combinedResults
# Combine the results I obtained from id_converter and rcitoid ----------
Results_id_converter <- data.frame(lapply(Results_id_converter, as.character), stringsAsFactors=FALSE) #switch to characters to match Results_rcitoid
names(Results_rcitoid) <- c("PMID", "DOI_rcitoid") #rename the column so I can merge them
combinedResults <- cbind(Results_id_converter,Results_rcitoid$DOI_rcitoid) # make a table with the combined results
names(combinedResults) = c("PMID", "DOI", "DOI_rcitoid")
#gives me all the unfriendly PMIDs
unfriendly_pmids <- subset(combinedResults, DOI == "Bad" & DOI_rcitoid == "Bad")
When I inspect the 172nd element in res
I see that the DOI
element is missing. Other elements that results in errors were 227,234,321,363,364,365,368,369,370,376,377,378.
Now I check to see if this PMID can be fixed using id_converter
instead of rcitoid
. I note that the PMID is 25669007 (Article can be seen here ) . Other elements that were bad were PMID
id_converter("25669007","pmid") # this is an example of one of the bad PMIDs
Below is the console output
$status
[1] "ok"
$responseDate
[1] "2019-06-11 13:49:14"
$request
[1] "tool=rcrossref;email=myrmecocystus%40gmail.com;ids=25669007;idtype=pmid;format=json"
$records
pmid live status errmsg
1 25669007 false error invalid article id
Hopefully this will be of use to someone. Thanks for the package its been of great use to me!
@alapo thanks for sharing!
note that cit_oid
does accept >1 id, so. you don't have to do lapply or similar.
closing for now ... reopen if there are other questions here