id_converter() not converting PMIDs correctly
Adafede opened this issue · 11 comments
Session Info
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] fr_CH.UTF-8/fr_CH.UTF-8/fr_CH.UTF-8/C/fr_CH.UTF-8/fr_CH.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] zoo_1.8-8 XML_3.99-0.3 webchem_1.0.0 UpSetR_1.4.0 forcats_0.5.0
[6] tidyr_1.1.0 tibble_3.0.1 tidyverse_1.3.0 taxize_0.9.96 stringr_1.4.0
[11] stringi_1.4.6 splitstackshape_1.4.8 rvest_0.3.5 xml2_1.3.2 reticulate_1.16
[16] rentrez_1.2.2 readxl_1.3.1 readr_1.3.1 rcrossref_1.0.0 RColorBrewer_1.1-2
[21] purrr_0.3.4 pbmcapply_1.5.0 jsonlite_1.6.1 igraph_1.2.5 ggraph_2.0.3
[26] eulerr_6.1.0 dplyr_1.0.0 digest_0.6.25 data.table_1.12.8 collapsibleTree_0.1.7
[31] chorddiag_0.1.2 ChemmineR_3.40.0 plotly_4.9.2.1 Hmisc_4.4-0 ggplot2_3.3.1
[36] Formula_1.2-3 survival_3.1-12 lattice_0.20-41
loaded via a namespace (and not attached):
[1] colorspace_1.4-1 rjson_0.2.20 ellipsis_0.3.1 htmlTable_1.13.3 fs_1.4.1 base64enc_0.1-3
[7] httpcode_0.3.0 rstudioapi_0.11 farver_2.0.3 urltools_1.7.3 graphlayouts_0.7.0 ggrepel_0.8.2
[13] DT_0.13 lubridate_1.7.8 fansi_0.4.1 codetools_0.2-16 splines_4.0.0 bold_1.0.0
[19] knitr_1.28 polyclip_1.10-0 broom_0.5.6 dbplyr_1.4.4 cluster_2.1.0 png_0.1-7
[25] ggforce_0.3.1 shiny_1.4.0.2 data.tree_0.7.11 compiler_4.0.0 httr_1.4.1 backports_1.1.7
[31] assertthat_0.2.1 Matrix_1.2-18 fastmap_1.0.1 lazyeval_0.2.2 cli_2.0.2 later_1.1.0.1
[37] tweenr_1.0.1 acepack_1.4.1 htmltools_0.4.0 tools_4.0.0 gtable_0.3.0 glue_1.4.1
[43] rsvg_2.1 tinytex_0.23 Rcpp_1.0.4.6 cellranger_1.1.0 vctrs_0.3.1 crul_0.9.0
[49] ape_5.4 nlme_3.1-148 iterators_1.0.12 xfun_0.14 mime_0.9 miniUI_0.1.1.1
[55] lifecycle_0.2.0 MASS_7.3-51.6 scales_1.1.1 tidygraph_1.2.0 hms_0.5.3 promises_1.1.0
[61] curl_4.3 gridExtra_2.3 triebeard_0.3.0 rpart_4.1-15 reshape_0.8.8 latticeExtra_0.6-29
[67] foreach_1.5.0 checkmate_2.0.0 bibtex_0.4.2.2 rlang_0.4.6 pkgconfig_2.0.3 bitops_1.0-6
[73] htmlwidgets_1.5.1 tidyselect_1.1.0 plyr_1.8.6 magrittr_1.5 R6_2.4.1 generics_0.0.2
[79] DBI_1.1.0 haven_2.3.1 pillar_1.4.4 foreign_0.8-80 withr_2.2.0 RCurl_1.98-1.2
[85] nnet_7.3-14 modelr_0.1.8 crayon_1.3.4 viridis_0.5.1 jpeg_0.1-8.1 grid_4.0.0
[91] blob_1.2.1 reprex_0.3.0 xtable_1.8-4 httpuv_1.5.4 munsell_0.5.0 viridisLite_0.3.0
Hi,
Thank you very much for your beautiful package.
I am using your package to retrieve DOIs from various sources. When working with titles, I use your cr_works() function which is great.
However, when working with pubmed IDs, I face following issue:
Some valid pubmed IDs seem not to be recognized.
As an example: 28371833
This is the output I get when using id_converter("28371833", "pmid")
:
$status
[1] "ok"
$responseDate
[1] "2020-06-08 02:03:29"
$request
[1] "tool=rcrossref;email=myrmecocystus%40gmail.com;ids=28371833;idtype=pmid;format=json"
$records
pmid live status errmsg
1 28371833 false error invalid article id
However, the article id is valid as easily recognized by entrez_summary(db = "pubmed", id = "28371833")[["title"]]
"Cytochrome P450 Monooxygenase CYP716A141 is a Unique β-Amyrin C-16β Oxidase Involved in Triterpenoid Saponin Biosynthesis in Platycodon grandiflorus."
It has nothing to do with the erratum, I checked other entries.
Some other IDs (31708947) work and I could not say why...
If any other infos are needed I am happy to give more details!
thanks for the report, having a look
The API request is here https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=rcrossref&email=myrmecocystus%40gmail.com&ids=28371833&idtype=pmid&format=json which gives the same response. So the problem is on the NCBI end of things. Not sure why they're saying its an invalid article ID.
(related issue #183 )
open citations corpus (https://github.com/ropenscilabs/citecorp) doesn't have that PMID either:
citecorp::oc_pmid2ids(28371833)
#> data frame with 0 columns and 0 rows
My bad... sorry for not re-opening there!
Strange from NCBI...but entrez seems to do the job correctly
no worries about opening this issue.
its hard to say why the problem is happening. the API service for id converter may be using some older database or something, there's no clarity on what's going on behind the scenes. You may be better of for ID conversion to us rentrez
If I understand correctly id_converter()
is built on NLM's ID Converter API which is limited to records in the PMC.
@JimHokanson explains in ropensci/rentrez#136 (comment)
As for a workaround, @dwinter's rentrez allows you make the conversion using rentrez::parse_pubmed_xml
and rentrez::pubmed_fetch
: ropensci/rentrez#136 (comment)
But that's a lot of extra data to download for just PMID-DOI conversion (when scaling to many records), so it would be great if there were a simpler converter. Ideally that also works from DOI to PMID (which is what I'm trying to do).
Here are some related links I've come across:
https://www.crossref.org/labs/pmid2doi/
https://www.pmid2cite.com/ (promising, but I'm not finding any open source or an API for batch processing)
Via their website:
https://www.pmid2cite.com/pmid-to-doi-converter
https://www.pmid2cite.com/doi-to-pmid-converter
I'd appreciate any further suggestions.
Hi, I'm not sure this is the right place for your question but anyway, pubmed API does the job perfectly if you just aim at converting DOIs to PM(C)IDS and vice versa.
You can also download locally pubmed conversions table if you really need it to be fast. (you could have a look at https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/)
Thanks, @Adafede. Unfortunately, the NLM converter doesn't work for DOIs not available in PMC, similar to the PMID limitation.
For example: 10.1056/NEJMoa1916623
https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=my_tool&email=my_email@example.com&ids=10.1056/NEJMoa1916623
id_converter() is built on NLM's ID Converter API
correct
We used to have a function for that Crossref pmid2doi service, see ?rcrossref-defunct
, but we made it defunct, i think it was too unreliable or went down, not sure .
Hadn't seen pmid2cite - agree that it doesn't look like there's any way to programatically use it.
Your example of 10.1056/NEJMoa1916623 might be a case where its so new that there isn't a PMID for it yet, Crossref and Unpaywall have the DOI, but they don't map to other identifiers.
One additional option is Fatcat - see https://api.fatcat.wiki/redoc#operation/lookup_release
for example: https://api.fatcat.wiki/v0/release/lookup?doi=10.1056/NEJMoa1916623
at least I don't think there's anything left to do here
Just in case someone stumbles on this awesome thread, do check out https://www.flickr.com/photos/dullhunk/454160748 that has some advice on this