trias-project/occ-cube-alien

Add PRA species to modelling taxa for European occurrence cube

damianooldoni opened this issue · 10 comments

This issue starts from @amyjsdavis comment in trias-project/indicators#84 (comment) which I copy pasted here below:

@damianooldoni : These are a different set of species (the plant species on the Species selection for PRA-ing list on google drive). They are different from the modellingtaxa list, and thus there is not an European cube. I realized that in the future, there will likely not be a cube for every species that are to be evaluated, so my modelling flow has the option to use the Cube as an input if already existing, or to process data directly downloaded from GBIF. As you may recall, I have to download global data for each species anyway for the models.

In order to make data processing as much linear and FAIR as possible, we decided to make a European occurrence cube for ALL species which we need for modelling, where modelling should be intended in a broader sense, that is any species @amyjsdavis and @DiederikStrubbe need to run SDM. There is a list of species for modelling in this repo (references/modelling _species.tsv) which should be updated anytime new species are found worth to be considered for risk maps.

I invite all of you to maintain it updated so that I can run a new version of the cube for them. Thanks.

OK, but Sonia needs these maps ASAP, so I don't want the cube to be a bottleneck. The EU cube is a nice addition,in that it saves me a step, but not necessary to run the models or be FAIR. The data downloaded from GBIF is saved and documented using TriAS  procedures that you and Peter developed. In addition, I already have to initiate a global download  of GBIF data for each species in order to make the global SDM.  I feel that my steps already follow FAIR principles. It seems an unnecessary step to have to put the data in the cube first prior to running the SDMs.  The point of my modelling flow is that anyone can make the risk maps using the GBIF taxon id and not have to feed in intermediate processed occurrence data.

@amyjsdavis @DiederikStrubbe I understand that the cube is not needed for the risk models, especially if you calibrate them on a global dataset. In terms of following FAIR principles: perhaps what you mean is that the handling of source data and model outcome is treated FAIRly by documenting and zenodo publication etc. However, I would currently have no clue as to where to start making the risk maps for any given species using the gbif taxon (for now) - which is what I would like to for several species I am interested in. For that, I would effectively need the code that produces them and it would have to be described, added to the repo, reviewed by Damiano and others who can get it bug-free and up and running to provide for anyone to use it. Without it, also, the models frankly remain a bit of a black box to us. Clearly, exchanging on intermediate products has been beneficial for the scientific quality in other parts of TrIAS such as for the indicator workflows. The whole point of TrIAS is to open up the entire workflow, including the code producing the models. Running them on the batch of species to be risk assessed this year is of course fine, but the eventual goal is to have a robust code that can handle doing the same thing for any of the 2800+ species on the unified checklist, and by anyone who needs it. Of course I understand that opening up code that you are still finetuning and working on can feel uncomfortable, but the team is really there to help you. Brief: can you add it to your risk modeling repo (or ask Damiano to do that)?

@timadriaens @damianooldoni I have realize my code not being on GitHub is becoming problematic and it has been on my to-do list .I have no reluctance sharing our code, I will make it a priority. The hold back is that several ancillary datasets that also need to be shared and have metadata etc in order to actually run the code. In the interest of getting the maps done, I have delayed this until now.

@damianooldoni : I have the list of species in an email- I will forward to you now.

Thanks @amyjsdavis. From your email:

It is my understanding that all the species highlighted are priority and we are starting with plants first.

Indeed, I have just now checked the highlighted species:

  1. Dama dama
  2. Neovison vison
  3. Podarcis siculus
  4. Tamiasciurus hudsonicus
  5. Orthriophis taeniurus
  6. Pelophylax ridibundus

But this species are already in modelling_species.tsv! So you have already a cube at European level for these species... Or do I miss something?

@damianooldoni : My apologies I've made a mistake about the highlighting. It is all the species present on the "species selection for PRA-ing" tab, highlighted or not.

@amyjsdavis: The spreadsheet I got has only the ranking_df tab.

@damianooldoni : I think it got dropped along all the forwards. I have sent you a different link. :) Here are the species:

taxonKey spn
5220136 Dama dama
5219380 Mephitis mephitis
2440954 Cervus nippon
2437619 Dolichotis patagonum
2437282 Tamiasciurus hudsonicus
2437397 Callosciurus prevostii
   
   
7190901 Graptemys pseudogeographica pseudogeographica
6157050 Graptemys pseudogeographica kohnii
9185677 Podarcis siculus
2455523 Orthriophis taeniurus
   
   
2426640 Pelophylax bedriagae
   
   
2498110 Anas sibilatrix
2498388 Aix galericulata
2498387 Aix sponsa
   
2891783 Impatiens balfourii
2891774 Impatiens capensis
2891782 Impatiens parviflora
2891770 Impatiens glandulifera
2882849 Vaccinium corymbosum
7777960 Vaccinium macrocarpum
7501634 Rosa multiflora
3003709 Rosa glauca
9202318 Symphyotrichum lanceolatum
3151618 Symphyotrichum novae-angliae
3151558 Symphyotrichum novi-belgii
3082244 Cornus sericea
8421432 Cornus sanguinea hungarica
3663237 Cornus sanguinea australis
2715482 Cyperus eragrostis
2716226 Cyperus esculentus
2992543 Rubus laciniatus
2993761 Rubus spectabilis
5281901 Campylopus introflexus
   
4559541 Sinanodonta woodiana
8190231 Corbicula fluminea
5189032 Corbicula fluminalis
9291405 Massylaea vermiculata
8745918 Deroceras invadens
5192470 Potamopyrgus antipodarum
   
2115692 Balanus tintinnabulum
2225646 Callinectes sapidus
5178057 Caprella mutica
2287249 Ensis directus
2225776 Eriocheir sinensis
2225772 Hemigrapsus sanguineus
4382841 Hemigrapsus takanoi
7820753 Magallana gigas
2501248 Mnemiopsis leidyi
2287076 Mytilopsis leucophaeata
2224970 Palaemon macrodactylus
5192470 Potamopyrgus antipodarum
2227663 Rhithropanopeus harrisii

Thanks. I will update the file and I will try to make a new European cube with these taxa included.

Thanks! In the meantime, I am cleaning up my script and will it to Github so you (and everyone else) can see what the heck I have been doing. :-)