epmc_search returns fewer fields than available in the API
Opened this issue · 1 comments
arvi1000 commented
Thank you for this package, maintainers!
I notice that epmc_search
doesn't return some of the useful fields that are available in the API. I think it would would be valuable to return all fields. For example, the API returns both the boolean hasTMAaccessionNumbers
but also the accessionType
, while the package returns only the former.
Example of different fields returned:
library(europepmc)
library(httr)
# get results for one id from the package and the api
package_result <- epmc_search("PMC10669250")
direct_api_result <-
GET('https://www.ebi.ac.uk/europepmc/webservices/rest/search?',
query = list(query='PMC10669250',
resultType='lite',
format='json')
) |>
content()
# compare fields returned
package_result |> names()
direct_api_result$resultList$result[[1]] |> unlist() |> names()
from the package:
[1] "id" "source" "pmcid" "title" "authorString" "journalTitle" "issue"
[8] "journalVolume" "pubYear" "journalIssn" "pubType" "isOpenAccess" "inEPMC" "inPMC"
[15] "hasPDF" "hasBook" "hasSuppl" "citedByCount" "hasReferences" "hasTextMinedTerms" "hasDbCrossReferences"
[22] "hasLabsLinks" "hasTMAccessionNumbers" "firstIndexDate" "firstPublicationDate"
from the API:
[1] "id" "source" "pmcid" "fullTextIdList.fullTextId"
[5] "title" "authorString" "journalTitle" "issue"
[9] "journalVolume" "pubYear" "journalIssn" "pubType"
[13] "isOpenAccess" "inEPMC" "inPMC" "hasPDF"
[17] "hasBook" "hasSuppl" "citedByCount" "hasReferences"
[21] "hasTextMinedTerms" "hasDbCrossReferences" "hasLabsLinks" "hasTMAccessionNumbers"
[25] "tmAccessionTypeList.accessionType" "firstIndexDate" "firstPublicationDate"
njahn82 commented
Hi @arvi1000,
You're right, the default method only returns a subset of Europe PMC data. To access all data, use the raw
option. Here's an example parser for your query:
library(europepmc)
library(tidyverse)
my_epmc_data <- epmc_search("PMC10669250", output = "raw")
#> 1 records found, returning 1
tibble::tibble(
id = map_chr(my_epmc_data, "id"),
tm_accession_type = map(my_epmc_data, "tmAccessionTypeList") |>
map_chr("accessionType")
)
#> # A tibble: 1 × 2
#> id tm_accession_type
#> <chr> <chr>
#> 1 PMC10669250 chebi
Created on 2024-06-12 with reprex v2.1.0