atlas_occurrences fails for UK: "Columns don't exist"
CrunchyLettuce opened this issue · 7 comments
I'm trying to retrieve NBN occurrences (the UK dataset), but each time it's throwing up the same error: that there are missing columns. I'm able to get counts from the UK data fine, and when I try the other datasets I don't get this error. I'm using Galah version 1.5.3
My code
galah_config(atlas = "United Kingdom")
galah_config(email = "my@email.com")
result <- galah_call() %>%
galah_identify("reptilia") %>%
galah_filter(year >= 2020) %>%
atlas_occurrences()
Error
Error in `all_of()`:
! Can't subset columns that don't exist.
✖ Columns `decimalLatitude`, `decimalLongitude`, `scientificName`, `recordID`, and `dataResourceName` don't exist.
Run `rlang::last_trace()` to see where the error occurred.
Any help would be great, thank you.
Thanks for reaching out about this issue. This error appears to show that atlas_occurrences()
is attempting to use the default set of fields (ie column names) for the Atlas of Living Australia rather than the UK's National Biodiversity Network when building the query. This should just need a small fix to get working again, which we'll be sure to add to the next version of {galah}. Thanks for letting us know!
In the meantime, one way to avoid this error is to specify which columns you want in your query with galah_select()
. Specifying columns prevents atlas_occurrences()
from attempting to use any of the defaults.
I managed to find the equivalent fields using search_all()
. Feel free to add more or less fields to your query as you need! 😄
(also, note that I changed the year in galah_filter()
to reduce the amount of data returned in this example)
library(galah)
library(magrittr)
galah_config(email = "your-email-here", atlas = "United Kingdom")
#> Atlas selected: National Biodiversity Network (NBN) [United Kingdom]
# search_all(fields, "data resource") # example search
result <- galah_call() %>%
galah_identify("reptilia") %>%
galah_filter(year >= 2022) %>%
galah_select(longitude, latitude, taxon_name, id, data_resource) %>%
atlas_occurrences()
#> This query will return 6,778 records
#>
#> Checking queue
#> Current queue size: 1 inqueue running .
result
#> # A tibble: 6,778 × 5
#> decimalLongitude decimalLatitude scientificName recordID data_resource
#> <dbl> <dbl> <chr> <chr> <chr>
#> 1 -3.10 52.9 Anguis fragilis ecb97e98-655… Records of a…
#> 2 -2.75 52.7 Zootoca vivipara c7f1f243-615… Records of a…
#> 3 0.67 50.9 Vipera berus c627df55-a7b… Records of a…
#> 4 -0.374 50.9 Anguis fragilis c3caa230-333… Records of a…
#> 5 0.849 51.8 Zootoca vivipara bc73fc49-59a… Records of a…
#> 6 -4.05 50.4 Zootoca vivipara b5d964d1-c13… Froglife's a…
#> 7 -3.46 50.7 Anguis fragilis a6924290-e28… Records of a…
#> 8 -3.23 51.6 Natrix helvetica 9f290871-554… SEWBReC Rept…
#> 9 -1.14 50.7 Anguis fragilis 956b44dd-f11… Records of a…
#> 10 -0.315 52.1 Natrix helvetica 91a0420a-a2f… Froglife's a…
#> # ℹ 6,768 more rows
Created on 2023-07-12 with reprex v2.0.2
On another note, some but not all of the column names are changed in the tibble
returned to match field names in the ALA. The names make sense, but seems strange to get new column names given we specified the fields in galah_select()
. We might want to update this renaming to be more consistent / clear to users
/cc @mjwestgate
Thanks for the quick fix! That's solved my issue.
No worries! We might keep this issue open for a bit longer as there are a few things here we still need to do to make sure this is fixed in the next version of galah 😃
Not sure whether I should open this as a new issue, but it might be related to any fixes you're doing.
The code above worked for one request, but now I'm getting this error:
This query will return 6,778 records
Checking queue
Current queue size: 2 inqueue failed Error: need one of url or handle
I've tried restarting R and doing the config command again, but I'm still getting the same error.
That error usually happens when the Atlas you are trying to query does not return anything after ~10 to 15 minutes. My guess is that because it says you were number 2 in the queue, the person's query before you might have been very large and held up your download for a while. Alternatively, the NBN might have had another issue that slowed your download long enough to time out. This can be a frustrating error, though, because often the solution is to be patient.
We have some fixes coming through in the next version to prevent galah from timing out, but for now this is an error that is most likely solved by the Atlas after a while - eventually whatever is holding up the queue will run and everything will work again. My advice when this crops up is to wait for a little while (maybe 30 mins to an hour) and then rerun your query again.
The good news is that I just ran a query and it returned a result, so it looks like things are working again!
Looks like this is solved for now - plus version 2.0 has specific tests for occurrence downloads from the UK - so I'll mark this as closed. Happy to reopen again if there is still a problem.