Add PRA species to modelling taxa for European occurrence cube
damianooldoni opened this issue · 10 comments
This issue starts from @amyjsdavis comment in trias-project/indicators#84 (comment) which I copy pasted here below:
@damianooldoni : These are a different set of species (the plant species on the Species selection for PRA-ing list on google drive). They are different from the modellingtaxa list, and thus there is not an European cube. I realized that in the future, there will likely not be a cube for every species that are to be evaluated, so my modelling flow has the option to use the Cube as an input if already existing, or to process data directly downloaded from GBIF. As you may recall, I have to download global data for each species anyway for the models.
In order to make data processing as much linear and FAIR as possible, we decided to make a European occurrence cube for ALL species which we need for modelling, where modelling should be intended in a broader sense, that is any species @amyjsdavis and @DiederikStrubbe need to run SDM. There is a list of species for modelling in this repo (references/modelling _species.tsv
) which should be updated anytime new species are found worth to be considered for risk maps.
I invite all of you to maintain it updated so that I can run a new version of the cube for them. Thanks.
OK, but Sonia needs these maps ASAP, so I don't want the cube to be a bottleneck. The EU cube is a nice addition,in that it saves me a step, but not necessary to run the models or be FAIR. The data downloaded from GBIF is saved and documented using TriAS procedures that you and Peter developed. In addition, I already have to initiate a global download of GBIF data for each species in order to make the global SDM. I feel that my steps already follow FAIR principles. It seems an unnecessary step to have to put the data in the cube first prior to running the SDMs. The point of my modelling flow is that anyone can make the risk maps using the GBIF taxon id and not have to feed in intermediate processed occurrence data.
@amyjsdavis @DiederikStrubbe I understand that the cube is not needed for the risk models, especially if you calibrate them on a global dataset. In terms of following FAIR principles: perhaps what you mean is that the handling of source data and model outcome is treated FAIRly by documenting and zenodo publication etc. However, I would currently have no clue as to where to start making the risk maps for any given species using the gbif taxon (for now) - which is what I would like to for several species I am interested in. For that, I would effectively need the code that produces them and it would have to be described, added to the repo, reviewed by Damiano and others who can get it bug-free and up and running to provide for anyone to use it. Without it, also, the models frankly remain a bit of a black box to us. Clearly, exchanging on intermediate products has been beneficial for the scientific quality in other parts of TrIAS such as for the indicator workflows. The whole point of TrIAS is to open up the entire workflow, including the code producing the models. Running them on the batch of species to be risk assessed this year is of course fine, but the eventual goal is to have a robust code that can handle doing the same thing for any of the 2800+ species on the unified checklist, and by anyone who needs it. Of course I understand that opening up code that you are still finetuning and working on can feel uncomfortable, but the team is really there to help you. Brief: can you add it to your risk modeling repo (or ask Damiano to do that)?
@timadriaens @damianooldoni I have realize my code not being on GitHub is becoming problematic and it has been on my to-do list .I have no reluctance sharing our code, I will make it a priority. The hold back is that several ancillary datasets that also need to be shared and have metadata etc in order to actually run the code. In the interest of getting the maps done, I have delayed this until now.
@damianooldoni : I have the list of species in an email- I will forward to you now.
Thanks @amyjsdavis. From your email:
It is my understanding that all the species highlighted are priority and we are starting with plants first.
Indeed, I have just now checked the highlighted species:
- Dama dama
- Neovison vison
- Podarcis siculus
- Tamiasciurus hudsonicus
- Orthriophis taeniurus
- Pelophylax ridibundus
But this species are already in modelling_species.tsv
! So you have already a cube at European level for these species... Or do I miss something?
@damianooldoni : My apologies I've made a mistake about the highlighting. It is all the species present on the "species selection for PRA-ing" tab, highlighted or not.
@amyjsdavis: The spreadsheet I got has only the ranking_df tab.
@damianooldoni : I think it got dropped along all the forwards. I have sent you a different link. :) Here are the species:
taxonKey | spn |
---|---|
5220136 | Dama dama |
5219380 | Mephitis mephitis |
2440954 | Cervus nippon |
2437619 | Dolichotis patagonum |
2437282 | Tamiasciurus hudsonicus |
2437397 | Callosciurus prevostii |
7190901 | Graptemys pseudogeographica pseudogeographica |
6157050 | Graptemys pseudogeographica kohnii |
9185677 | Podarcis siculus |
2455523 | Orthriophis taeniurus |
2426640 | Pelophylax bedriagae |
2498110 | Anas sibilatrix |
2498388 | Aix galericulata |
2498387 | Aix sponsa |
2891783 | Impatiens balfourii |
2891774 | Impatiens capensis |
2891782 | Impatiens parviflora |
2891770 | Impatiens glandulifera |
2882849 | Vaccinium corymbosum |
7777960 | Vaccinium macrocarpum |
7501634 | Rosa multiflora |
3003709 | Rosa glauca |
9202318 | Symphyotrichum lanceolatum |
3151618 | Symphyotrichum novae-angliae |
3151558 | Symphyotrichum novi-belgii |
3082244 | Cornus sericea |
8421432 | Cornus sanguinea hungarica |
3663237 | Cornus sanguinea australis |
2715482 | Cyperus eragrostis |
2716226 | Cyperus esculentus |
2992543 | Rubus laciniatus |
2993761 | Rubus spectabilis |
5281901 | Campylopus introflexus |
4559541 | Sinanodonta woodiana |
8190231 | Corbicula fluminea |
5189032 | Corbicula fluminalis |
9291405 | Massylaea vermiculata |
8745918 | Deroceras invadens |
5192470 | Potamopyrgus antipodarum |
2115692 | Balanus tintinnabulum |
2225646 | Callinectes sapidus |
5178057 | Caprella mutica |
2287249 | Ensis directus |
2225776 | Eriocheir sinensis |
2225772 | Hemigrapsus sanguineus |
4382841 | Hemigrapsus takanoi |
7820753 | Magallana gigas |
2501248 | Mnemiopsis leidyi |
2287076 | Mytilopsis leucophaeata |
2224970 | Palaemon macrodactylus |
5192470 | Potamopyrgus antipodarum |
2227663 | Rhithropanopeus harrisii |
Thanks. I will update the file and I will try to make a new European cube with these taxa included.
Thanks! In the meantime, I am cleaning up my script and will it to Github so you (and everyone else) can see what the heck I have been doing. :-)