404 error at download attempt
andybega opened this issue · 5 comments
First thank you for developing the icews package. I am trying to use the minimalist functionality and running into an error.
This error occurs for both the update_icews() and download_data() functions when dryrun is set to False. My setup has use_db = F and keep_files =T.update_icews(dryrun = F) Downloading 'events.1995.20150313082510.tab.zip' Error in get_file(file_ref, get_doi()[[repo]]) : Not Found (HTTP 404).
I am hoping this is a common error and an answer is ready available. Thanks for your help.
This is a simpler example:
This should download one of the documentation PDFs:
library("icews")
#> Options not set, consider running 'setup_icews()'
#> data_dir: NULL
#> use_db: NULL
#> keep_files: NULL
dataverse::get_file(2711073, dataset = get_doi()$historic)
#> Error in dataverse::get_file(2711073, dataset = get_doi()$historic): Not Found (HTTP 404).
Created on 2020-01-06 by the reprex package (v0.3.0)
Looks like the problem is with either the R dataverse client or the dataverse API itself. The direct URL for the PDF file above is https://dataverse.harvard.edu/api/access/datafile/2711073, and it works.
However, in dataverse::get_file()
, a query parameter for the desired format is set to "original" by default, leading to the URL https://dataverse.harvard.edu/api/access/datafile/2711073/?format=original. That breaks and leads to the 404 error.
This is the essential bit from the dataverse::get_file()
internals:
library("dataverse")
library("httr")
key <- Sys.getenv("DATAVERSE_KEY")
u <- "https://dataverse.harvard.edu/api/access/datafile/2711073"
query <- list(format = "original")
r <- httr::GET(u, httr::add_headers(`X-Dataverse-key` = key))
# works
status_code(r)
#> [1] 200
# with format argument it does not work
r <- httr::GET(u, httr::add_headers(`X-Dataverse-key` = key), query = query)
status_code(r)
#> [1] 404
Created on 2020-01-06 by the reprex package (v0.3.0)
This looks like a relevant issue in the R dataverse client repo: IQSS/dataverse-client-r#33
Hi, I have the same issue.
Also note that in the dry run there is no file for the past 6 months or so (last one is 20190625); maybe it's related:
Download '20190622-icews-events.zip'
Ingest records from '20190622-icews-events.tab'
Download '20190623-icews-events.zip'
Ingest records from '20190623-icews-events.tab'
Download '20190624-icews-events.zip'
Ingest records from '20190624-icews-events.tab'
Download '20190625-icews-events.zip'
Ingest records from '20190625-icews-events.tab'
> # Should list proposed downloads, ingests, etc.
> update_icews(dryrun = FALSE)
Downloading 'events.1995.20150313082510.tab.zip'
Error in dataverse::get_file(file = file_ref, dataset = get_doi()[[repo]]) :
Not Found (HTTP 404).
Here is a workaround and a blueprint for a fix:
.libPaths() # make sure to remove all dataverse in all places
remove.packages("dataverse")
# restart R
devtools::install_github("IQSS/dataverse-client-r")
#Installing package into ‘/home/mk/R/x86_64-pc-linux-gnu-library/3.6’
#(as ‘lib’ is unspecified)
#* installing *source* package ‘dataverse’
library("icews")
library("DBI")
library("dplyr")
library("usethis")
print(sessionInfo())
#loaded via a namespace (and not attached):
#[...]
#[25] glue_1.3.1 dataverse_0.2.1.9001 RSQLite_2.2.0
setup_icews(data_dir = "~/temp_icews", use_db = TRUE, keep_files = TRUE,
r_profile = TRUE)
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
update_icews(dryrun = FALSE)
as per IQSS/dataverse-client-r#33 (comment)
Hope it helps! Cheers
Hey @mayeulk, thanks! This works for me now as well:
devtools::install_github("IQSS/dataverse-client-r")
# restart R
library("icews")
library("dataverse")
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
# works now
foo = dataverse::get_file(2711073, dataset = get_doi()$historic)
I will keep this issue open until dataverse is updated on CRAN.
Also note that in the dry run there is no file for the past 6 months or so (last one is 20190625); maybe it's related:
ICEWS has stopped updating. I heard they managed to regain funding but I have no idea when they will resume.