EcoForecast/EF_Activities

Bug in Web Scraping example (03_Big_Data)

Closed this issue · 6 comments

When running line 82 of 03_Big_Data
fia_html <- getURL("https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html")
I get the following error message:

Error in function (type, msg, asError = TRUE)  : 
  error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure

Using http instead of https removes the error message but returns an empty page

fia_html <- getURL("http://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html")
fia_html
[1] ""

It could be a security issue with my maschine, but I was wondering if someone else has faced the same problem before?

  • I just confirmed that the offending line of code still runs for me and that the URL still points to the right page (i.e. the URL hadn't changed, which does happen from time to time and breaks these exercises)

  • That said, this specific activity definitely needs an update! When I used it in class this spring there were a number of tasks that were much harder to perform on Windows machines than on Mac or Linux (e.g. tools that need to be installed manually on WIndows that are there by default in other OS). Also the set of available technologies, and the way people design websites and data portals, have both evolved since this activity was first developed. On my to-do list.

  • My experience this spring (semester course) and summer (short course) is that the remaining activities that come after this one all run more smoothly than this one (though definitely report any bugs you do find!)

I am using windows, so I can confirm that and also accessing the sites from Europe with might (?) be relevant. Two other things that I noticed in that exercise:

  • Question 3
system("wget http://co2.aos.wisc.edu/data/cheas/wlef/netcdf/US-PFa-WLEF-TallTowerClean-2012-L0-vFeb2013.nc")
wlef = nc_open("US-PFa-WLEF-TallTowerClean-2012-L0-vFeb2013.nc")

gives

Error in nc_open("US-PFa-WLEF-TallTowerClean-2012-L0-vFeb2013.nc") : 
  Error in nc_open trying to open file US-PFa-WLEF-TallTowerClean-2012-L0-vFeb2013.nc

I downloaded the file manually and imported in from my machine and then it worked fine, so it seems to be an issue with the website, not the data.

  • Question 4

EVI = tapply(subset$data$data, subset$data$calendar_date, mean,na.rm=TRUE) * as.numeric(subset$header$scale)
gives
Error in split.default(X, group) : first argument must be a vector

Question 3: The file is fine and the website is fine (just checked both), but wget is not installed by default on Windows machines. That's why I added the caveat "if you don't have wget installed, use your web browser" earlier this year.

Question 4: That line didn't give me an error. It's possibly that the MODIS data didn't download, or the download got corrupted. Could you verify that the following lines give you sensible outputs

subset$header
head(subset$data)

Question 3: Ok, well this explains it then. Thank you.

Question 4: This returns

>subset$header
NULL
> head(subset$data)
NULL

However, subset returns

        xllcorner  yllcorner      cellsize nrows ncols             band                units  scale latitude longitude
1.1   -6940887.76 5123080.40 231.656358264     9     9 250m_16_days_EVI EVI ratio - No units 0.0001  46.0827  -89.9792
2.1   -6940887.76 5123080.40 231.656358264     9     9 250m_16_days_EVI EVI ratio - No units 0.0001  46.0827  -89.9792
3
...
             site product      start        end complete modis_date calendar_date   tile     proc_date pixel value
1.1   WillowCreek MOD13Q1 2012-01-01 2012-12-31     TRUE   A2012001    2012-01-01 h11v04 2015236172231     1  1854
2.1   WillowCreek MOD13Q1 2012-01-01 2012-12-31     TRUE   A2012017    2012-01-17 h11v04 2015237090200     1   978
...

so something did download.

Argh, the package was updated and the structure of the data returned is different!

Here's the updated code snipped, which I'm pushing to github now

EVI = tapply(subset$value*as.numeric(subset$scale), subset$calendar_date, mean,na.rm=TRUE)

Confirming that this worked. I am closing this now, thank you for taking the time.