returning more than 1000 DOIs using EuropePMC database
banderson10 opened this issue · 3 comments
Hello,
I have two questions related to using ft_search() to return DOIs from the EuropePMC database. The questions are below and there is an example to help assist with my questions.
- When I run the first example below that is the example in the fulltext manual, I receive a variable in the 'a' object that contains the 1,000 DOIs, a$europmc$data$doi.
res <- ft_search(query="ecology", from='europmc')
a <- ft_search(query="ecology", from='europmc', limit=1000,
euroopts = list(cursorMark = res$europmc$cursorMark))
When I change the search term to my desired search term, ft_search() does not return any DOI values. a1$europmc$data$doi does not exist in the a1 object.
res1 <- ft_search(query="spanish flu", from='europmc')
a1 <- ft_search(query="spanish flu", from='europmc', limit=1000,
euroopts = list(cursorMark = res1$europmc$cursorMark))
I need the DOIs because I am searching other databases with ft_search(), and I am using the DOI as the unique identifier to remove duplicates before I fetch the full text xml files.
- Obtaining more than 1,000 DOIs from a EuropePMC search.
I have read the #184 post for this package in which the author explains that you have to use a cursor to 'page through' the query results. Using the example in the full text manual, as shown below, the query returns 416,312 hits.
res <- ft_search(query='ecology', from='europmc')
res$europmc
You can then use the cursorMark argument to 'page through' the results. The code below will return the first 1,000 hits.
a2 <- ft_search(query='ecology', from='europmc', limit=1000,
euroopts = list(cursorMark = res$europmc$cursorMark))
The question is how do you obtain the next 1,000 hits and the next 1,000 hits, and so on.... For example, what if you wanted to obtain all 416,312 DOIs?
Thank you for any advice/suggestions you can provide!
Billie
hi @banderson10 i've changed jobs and I haven't been able to find a new maintainer for this pkg yet
This repository is about to be archived.
If you develop a related package, it might be in scope for https://ropensci.org/software-review/