lifewatch/eurobis

Update eurobis package

Opened this issue · 2 comments

An update of this package is planned for spring 2022.

The current way the package works is:

  1. via getEurobisData you can provide an aphiaid, mrgid, dasid and dates.
  2. An URL to geoserver using WFS is generated.
  3. If there are more than 20K records in the call, these are downloaded in batches (Very slow)
  4. The data are loaded into your R environment (this is loaded into memory, won't work with large amounts of data)
  5. A list of the datasets is provided together with the data, retrieved via IMIS.

In addition, there are functions to load the grids per species, which helps to visualize more quickly the distribution of the species: getEurobisGrid()

This GitHub issue lists the tasks that can be looked at:

  • Improve getting large amounts of data: Currently, large amounts of data are near impossible to download. We have to look into adding pagination and cache to improve this. See e.g. #8
  • Use of imis package: Check if the use of vlizBe/imis is needed or if there's a better alternative - this dependency causes many installation troubles #11
  • Metadata: Currently the package gives the getEurobisData() output as a list with 3 tables, including metadata. While is great to provide metadata, I wonder if this is the best way for the users. I propose to include specific functions that provide a list of datasets being used by an specific query. Also definitions for each column.
  • Use EMODnetWFS?: The package uses the EMODnet-Biology webservices. So far it just uses the plain URL, but we can look into using instead EMODnet/EMODnetWFS or in a higher level -> eblondel/ows4R . Problem: querying by Marine Regions mrgid requires to use the viewParams WFS query option: this is not implemented yet in EMODnetWFS and we don't have knowledge on how to use it via ows4r. #15
  • Change names of functions?: Currently the functions are named as getEurobisData() instead of get_eurobis_data(). I propose to change the naming to use underscores as this is more widely used in R. For example, we can adhere to what the obis package does and call it: eurobis_occurrences()
  • Add CI: There are currently no automatic tests. We should add tests and add a CI pipeline via github actions. #13 #14
  • Publication: Once fully developed, this package should be submitted to CRAN and maybe ask a review to ROpenSci.

Nice summary, thanks!

  • should the data be consistent with what's coming out of the EMODnet Bio download toolbox? (and have the option for the 3 versions? https://www.emodnet-biology.eu/emodnet-data-format)
  • metadata: What metadata is needed? What is returned in the download toolbox? How is this retrieved via the download toolbox?
  • vlizBE/imis: I think we should maintain a package that retrieves imis data (and loads it into R). But I'm not sure if this should be done in for this task -> it will probably depend on the previous question regarding what metadata is needed/wanted
  • Publication: depends on how mature we think the package will be after the update. (we decide this after the update)
  • Yes of course. Although: keep in mind the full occurrences and parameters option does not return the date in the exact same format by the download toolbox and the webservices: the download toolbox splits the EMOF and occurrences in two tables; the webservices return all in one table. We can either split the data on the client side to be closer to the download toolbox, or we could not do it. I would say we should not transform the data on the client side and stick to the webservices - or add a helper function to split the data.
  • metadata: Actually just checked the download toolbox and does not return any metadata.
  • vlizBE/imis: the problem is that any issue in the imis package cascades to the eurobis package. I would rather keep both packages separated and explain how to use them together in a vignette.
  • Publication: agree :)