ISCN/SOC-DRaHR

Download files from repository within R?

Opened this issue · 2 comments

Should we start some webscraping activity to pull files directly from the web rather than download them manually? I know the idea is to have all of this data integrated into ISCN, but if we're going to be data hacking as an ongoing process, it might be easier to check each other's scripts more easily without having to go download the data ourselves.

skyew commented

I think this is a good idea (downloading data for use, with minimal user pain – you can even have R handle passwords) – we do it all the time for Soil Survey users that rarely use R, but want access to graphs or whatever.

Here’s what my download usually looks like.

ncss.url <- 'https://nrcs.box.com/shared/static/kroh8h124zligmamsorsk18d98gt9qkd.csv'

#if file does not existt, download

ifelse(!file.exists('data/ncss.w.bd.csv'),download.file(ncss.url, destfile = paste0(BD_data,'/data/ncss.w.bd.csv')), "File already downloaded")

I’m sure I have an example of unzipping; gzip maybe.

Then I just rm(ncss) at the end.

And you can delete the file – file.remove()

I can help or test once my brain recovers

Good ideas here! I just updated the SoilDataR::processData_ISCN3 function ( https://github.com/ktoddbrown/soilDataR/pull/7 ) to download the excel files from the ISCN website and no longer require a user export to csv. I still need to add something to the general process function.