andybega/icews

404 error at download attempt

andybega opened this issue · 5 comments

First thank you for developing the icews package. I am trying to use the minimalist functionality and running into an error.
This error occurs for both the update_icews() and download_data() functions when dryrun is set to False. My setup has use_db = F and keep_files =T.

update_icews(dryrun = F)
Downloading 'events.1995.20150313082510.tab.zip'
Error in get_file(file_ref, get_doi()[[repo]]) : Not Found (HTTP 404).

I am hoping this is a common error and an answer is ready available. Thanks for your help.

This is a simpler example:

This should download one of the documentation PDFs:

library("icews")
#> Options not set, consider running 'setup_icews()'
#> data_dir: NULL
#> use_db: NULL
#> keep_files: NULL
dataverse::get_file(2711073, dataset = get_doi()$historic)
#> Error in dataverse::get_file(2711073, dataset = get_doi()$historic): Not Found (HTTP 404).

Created on 2020-01-06 by the reprex package (v0.3.0)

Looks like the problem is with either the R dataverse client or the dataverse API itself. The direct URL for the PDF file above is https://dataverse.harvard.edu/api/access/datafile/2711073, and it works.

However, in dataverse::get_file(), a query parameter for the desired format is set to "original" by default, leading to the URL https://dataverse.harvard.edu/api/access/datafile/2711073/?format=original. That breaks and leads to the 404 error.

This is the essential bit from the dataverse::get_file() internals:

library("dataverse")
library("httr")

key <- Sys.getenv("DATAVERSE_KEY")

u <- "https://dataverse.harvard.edu/api/access/datafile/2711073"
query <- list(format = "original")

r <- httr::GET(u, httr::add_headers(`X-Dataverse-key` = key))
# works
status_code(r)
#> [1] 200

# with format argument it does not work
r <- httr::GET(u, httr::add_headers(`X-Dataverse-key` = key), query = query)
status_code(r)
#> [1] 404

Created on 2020-01-06 by the reprex package (v0.3.0)

This looks like a relevant issue in the R dataverse client repo: IQSS/dataverse-client-r#33

Hi, I have the same issue.
Also note that in the dry run there is no file for the past 6 months or so (last one is 20190625); maybe it's related:

Download            '20190622-icews-events.zip'
Ingest records from '20190622-icews-events.tab'
Download            '20190623-icews-events.zip'
Ingest records from '20190623-icews-events.tab'
Download            '20190624-icews-events.zip'
Ingest records from '20190624-icews-events.tab'
Download            '20190625-icews-events.zip'
Ingest records from '20190625-icews-events.tab'
> # Should list proposed downloads, ingests, etc.
> update_icews(dryrun = FALSE)
Downloading 'events.1995.20150313082510.tab.zip'
Error in dataverse::get_file(file = file_ref, dataset = get_doi()[[repo]]) : 
  Not Found (HTTP 404).

Here is a workaround and a blueprint for a fix:

.libPaths() # make sure to remove all dataverse in all places
remove.packages("dataverse")
# restart R
devtools::install_github("IQSS/dataverse-client-r")
#Installing package into ‘/home/mk/R/x86_64-pc-linux-gnu-library/3.6’
#(as ‘lib’ is unspecified)
#* installing *source* package ‘dataverse’
library("icews")
library("DBI")
library("dplyr")
library("usethis")
print(sessionInfo())
#loaded via a namespace (and not attached):
#[...]
#[25] glue_1.3.1             dataverse_0.2.1.9001   RSQLite_2.2.0

setup_icews(data_dir = "~/temp_icews", use_db = TRUE, keep_files = TRUE,
            r_profile = TRUE)
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
update_icews(dryrun = FALSE)

as per IQSS/dataverse-client-r#33 (comment)

Hope it helps! Cheers

Hey @mayeulk, thanks! This works for me now as well:

devtools::install_github("IQSS/dataverse-client-r")

# restart R
library("icews")
library("dataverse")
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

# works now
foo = dataverse::get_file(2711073, dataset = get_doi()$historic)

I will keep this issue open until dataverse is updated on CRAN.

Also note that in the dry run there is no file for the past 6 months or so (last one is 20190625); maybe it's related:

ICEWS has stopped updating. I heard they managed to regain funding but I have no idea when they will resume.