fathomnet/community-feedback

Export data and metadata from FathomNet and format for provision to GBIF and OBIS

Closed this issue · 8 comments

Export data and metadata from FathomNet and format for provision to GBIF and OBIS

@kakanikatija Will talk to Abby Benson about this to make sure we have all the fields they need.

From Abby Benson:

I looked at what you have and things are looking great so far! Really exciting to see the use of Darwin Core in your upload form 🙌

I might have shared this already so apologies for the repeat information but thought it might help to have the list of required Darwin Core terms for OBIS and where I see this information coming from in FathomNet:
eventDate = timestamp​ in optional columns, we would need to drop observations where this is missing
decimalLatitude = latitude​ in optional columns, needs to be in WGS84, need to drop obs if missing
decimalLongitude = longitude​ in optional columns, needs to be in WGS84, need to drop obs if missing
scientificName = concept​ but must be a scientific name
scientificNameID = I think you all perform a lookup to WoRMS? This would be the LSID (e.g. urn:lsid:marinespecies.org:taxname:218214)
occurrenceStatus = "present" for all observations
basisOfRecord = Basis Of Record in required metadata
occurrenceID = userdefinedkey​ in optional columns but maybe FathomNet has something to use for this? Needs to be globally unique.

The required pieces of metadata are:
Data License (CC-0, CC-BY, or CC-BY-NC) = License in required metadata but for OBIS and GBIF is must be one of the three listed- I tried to access your "FathomNet Use Policy" but got a 404
Description (Abstract) = missing
Resource Contacts (Last name, Position, Organization, Email) = missing
Resource Creators (Last name, Position, Organization, Email) = missing
Metadata Providers (Last name, Position, Organization, Email) = missing

I am not sure if you will be sharing the data from all of FathomNet as one dataset or if you would be planning to share each submission as separate datasets. The answer to that will affect what we need for the required metadata list you see above. I'm thinking one dataset will be easiest and since you are using the Darwin Core terms for Rights Holder, Owner Institution Code, etc the data providers would still be clearly identified in the aggregated dataset (to me this would be similar to how iNaturalist shares data to GBIF). But there could be a complication for the data license because it needs to be set at the dataset level. Happy to meet to discuss this.

Crosswalk of other FathomNet fields that are not required but helpful to have if they are available:
depth​ -> minimumDepthInMeters and maximumDepthInMeters (users really like to have this!)
imagingtype​ -> eventRemarks
observer​ -> identifiedBy
oxygen​, pressure​, salinity​, and temperature​ do not have defined Darwin Core terms but can be included in the extended measurement or fact extension. We can ignore these for now and work on it later once we have a basic required data export working.

I hope this was what you were looking for. Again, happy to meet to discuss.

@albenson-usgs, I would also suggest other DwC fields could be of interest, especially as systems like FathomNet and MBARI may have names and notes that are not readily read or transferred otherwise. For example: verbatimIdentification (exact name used, e.g., 'Sea Pen' rather the scientificName concept like 'Pennatulacea' that is required, or perhaps 'jellies'--that would not have a taxonomic 'name', so the concept would be 'biota' but jellies would be in verbatim), identificationRemarks (like verbatim, but can add notes, e.g., Looks like cf. Anthoptilum grandiflorum), taxonomicRemarks (e.g., Hiatella arctica is likely of a species complex, but not yet defined). https://www.tdwg.org/

@claudenozeres yes agree those would be useful to include as well. Also Sea Pen and Jellies seem to be vernacularName so we could put those there as well.

May be in the plans for Portal, but not in scope for FathomNet

@kevinsbarnard This feels like a rather abrupt shutdown of a conversation the community was pretty excited about. Would you mind clarifying which part of this is out of scope for FathomNet, and how that differs from the Portal? For those of us who aren't intimately involved in the development, it's hard to tell when, and with whom, it might be appropriate to pick this conversation back up.

@sformel-usgs Apologies for the abrupt closing of this issue without more context. We're currently reviewing all of the issues related to FathomNet in order to prioritize engineering effort for the remainder of 2024. The above was a quick note out of a discussion we had yesterday.

As I understand, in the broader context of Ocean Vision AI, we'd love to enable exporting of data to OBIS, GBIF, and other systems. For FathomNet in particular, this was decided to be out of scope due to the fact that FathomNet alone isn't slated to contain all of the metadata that could make up a complete occurrence dataset. Our thinking is that it would make more sense for these data to be exported from the source -- i.e., collections within the Portal, which will allow for all of the appropriate metadata. This could likely include the images and annotations currently present on FathomNet, just through a different route.

As for when/where/with whom we can pick this back up, that may be a @kakanikatija question.

Gotcha. Thanks for taking the time to follow up, it's very appreciated! I'll stay patient and when y'all are ready to engage in this discussion again (for the portal, or otherwise), feel free to reach out to me, @albenson-usgs and @claudenozeres.