ropensci/rcrossref

Any way to get full text from rcrossref anymore

padpadpadpad opened this issue · 2 comments

Hi everyone

Just wondering if there is anyway to get full text from rcrossref anymore. I am interested in trying to collate Data Accessibility statements and am conscious publishers probably wont like it if I start read_html()-ing loads of webpages.

Seems like the methods moved to crminer, but that is no longer under development so just interested if anyone has any recommendations.

Cheers
Dan

Hi,

You can use rcrossref to identify articles and get TDM full-text links from Crossref. Once you have the TDM links, you'll need to check with the publisher to see how to download the full texts.

1. Identify articles and get TDM full-text links from Crossref

The following reprex shows how to identify articles and get TDM full-text links from Crossref for the DOI 10.1002/asi.24460:

library(rcrossref)
library(tidyverse)
my_cr_df <- cr_works(doi = "10.1002/asi.24460")$data

tdm_links <- my_cr_df |>
  select(doi, link) |>
  unnest(link) |>
  filter(intended.application == "text-mining")
 
tdm_links |>
   select(URL, content.type)
#> # A tibble: 2 × 2
#>   URL                                                            content.type   
#>   <chr>                                                          <chr>          
#> 1 https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24460      application/pdf
#> 2 https://onlinelibrary.wiley.com/doi/full-xml/10.1002/asi.24460 application/xml

Created on 2023-09-29 with reprex v2.0.2

2. Download

Once you have the TDM links, you'll need to check with the publisher to see how to download the full texts. Many publishers, such as Elsevier and Wiley, require you to register for an API key and add it to your HTTP request. Even if a full-text is open access, these publishers may not allow you to access it programmatically without registration.

Here are links to information on how to download full texts from Elsevier and Wiley:

General Crossref TDM info: https://www.crossref.org/documentation/retrieve-metadata/rest-api/text-and-data-mining/

I hope this is helpful!

Najko

This is super useful thanks!