isamplesorg/metadata

Unimplemented methods in OpenContext metadata transformer

Closed this issue · 6 comments

The following methods are currently unimplemented in OpenContext and need further definition:

    def produced_by_id_string(self) -> typing.AnyStr:
        return Transformer.NOT_PROVIDED

    def produced_by_feature_of_interest(self) -> typing.AnyStr:
        return Transformer.NOT_PROVIDED

Pretty sure we just don't have elevation information in OpenContext

    def sampling_site_elevation(self) -> typing.AnyStr:
        Transformer.NOT_PROVIDED

I believe these two live on the OpenContext project, which wasn't included in the API @ekansa asked us to use. We could ask it to be included, or if it's important we could build a local cache of all the OpenContext projects and consult that during the transform to include in the records.

    def sampling_site_description(self) -> typing.AnyStr:
        Transformer.NOT_PROVIDED

    def sampling_site_label(self) -> typing.AnyStr:
        Transformer.NOT_PROVIDED

Once we have documentation on how to implement, please assign back to me for implementation. If we decide that these fields aren't relevant, that is good to know, too, as I'll add comments to the code that indicates this.

@datadavev @dannymandel

Can you point me to an example of data you've fetched (and cached) from Open Context so I can better suggest how to use project and (spatial) context information included in our API outputs?

e.g.

image

image

@dannymandel @datadavev Yep, we discussed this a bit in one of our calls.

In the original JSON (see: https://mars.cyverse.org/thing/http%3A%2F%2Fopencontext.org%2Fsubjects%2F850968DA-2F09-424D-25B9-7320A02F8992?full=false&format=original) , you'll find a "context uri" key. That will give you a URI: http://opencontext.org/subjects/9DFFE999-3FA6-4193-65FE-86489B42BB70

If you get the JSON representation of that resource (either content negotiation or add a ".json" to the end of the URL, like: http://opencontext.org/subjects/9DFFE999-3FA6-4193-65FE-86489B42BB70.json), you'll find the context hierarchy keyed by "oc-gen:has-context-path". This context hierarchy is a list in order from most general to most specific. For sampling sites, then you'd probably want to use the site, identified by the item "type": "oc-gen:cat-site". In this case it would be:

{
    "id": "http://opencontext.org/subjects/E44A115A-DFCB-4971-6750-40955DF2C062",
    "slug": "34-catalhoyuk",
    "label": "Çatalhöyük",
    "type": "oc-gen:cat-site"
},

A site is generally useful metadata that I think most people will want to use to find provenance of a sample. However, not every sample in Open Context will come from a "site", some finds come from surface surveys and I don't currently have a really good way of picking the "best" item in a context hierarchy for finds from a survey. Perhaps the best thing to do in cases where a oc-gen:cat-site is absent from the context hierarchy is to chose the last (most specifc) oc-gen:cat-region. That will typically be a meaningful geographic provenance for a sample that does not come from a site.

We can use this

def sampling_site_label(self) -> typing.AnyStr:
        return self.source_record.get("context label", Transformer.NOT_PROVIDED)

and then for produced_by_id produced_by_feature_of_interest we don't have an answer. We don't have a good value for sampling_site_description. We also don't have sampling_site_elevation.

Per discussion with @datadavev, this is the best we can do for now.