Improve documentation: What a mapping is... and is not.

Question

Improve documentation: What a mapping is... and is not.

joeflack4 opened this issue a year ago · 6 comments

Overview

I glanced through the SSSOM docs but I didn't see anything about what a mapping is and is not.

I'm guessing it has something to do with the concept of "degree of equivalence / shared properties". It would be good to distinguish between mapping predicates and some other example relationship predicates.

Could also include short list of preds SSSOM considers by default (I'm guessing something like skos exact/broad/narrow/close/related & oboInOwl:hasDbXref).

Real world cases

OMOP has ~450 different relationships. It would be nice to be able to show which of those relationships are mapping relationships and why.

Additional details

Context:

mapping-commons/sssom-py#456 (comment)

Answer 1 · 2023-10-19T19:36:06.000Z

Thank you @joeflack4 I think it is a good thing to add a statement about this to the documentation. The answer is not going to be easy. If you ask the average mapping expert (aka your favourite LLM) what a mapping is, they will tell you something like:

A semantic mapping relationship refers to the connection or correspondence established between two or more concepts across different systems, ontologies, or datasets, based on their meanings or interpretations. This relationship captures the equivalence, similarity, or relatedness of the linked concepts, ensuring that data can be understood, integrated, and interoperated in a consistent manner across diverse sources.

I think the answer to this question will come down to delineating the idea of "correspondence" from mere "association".

Correspondence:

"Correspondence" refers to a specific relationship established between two or more entities across different datasets, schemas, or systems, indicating that they represent the same or equivalent information. The term often implies a more formal, direct, and explicit linkage between entities, suggesting a stronger degree of equivalence or similarity.

For instance, in the process of ontology alignment or schema matching, when two concepts from different ontologies are determined to be equivalent, a correspondence is established between them. This correspondence ensures that data using either of these concepts can be seamlessly integrated or queried across both systems.

Association:

"Association", on the other hand, implies a more general relationship between entities. It suggests that two or more entities are related in some manner but doesn't necessarily dictate the nature or strength of that relationship. An association can be broad, encompassing various types of relationships like correlation, dependency, or co-occurrence.

In data mapping, while a correspondence might indicate that two fields in different databases represent the same piece of information (e.g., "First Name" in Database A corresponds to "Given Name" in Database B), an association might indicate that two fields are related but not equivalent (e.g., "Product ID" in Database A is associated with "Transaction ID" in Database B because transactions are related to products, but they don't represent the same concept).

So, Correspondence implies a direct, often bidirectional, and stronger relationship suggesting equivalence or very close similarity. Association is a broader term indicating some form of relationship but without specifying its nature or strength.

I think anything that can be subsumed under the idea of "correspondence" predicate is a predicate whose presence indicates a true mapping relationship. Conversely, everything that is an association that does not correspond to a "correspondence" does not indicate a true mapping relationship. Let's call these "not-correspondences". It is my view, and correct me if I am wrong, that "not-correspondences" are not in scope for SSSOM.

Examples

Clear examples of correspondences

:A skos:exactMatch :B
:A owl:equivalentClass :B
:A skos:broadMatch :B
:A semapv:crossSpeciesNarrowMatch :B
:a owl:sameAs :B

Basically everything in https://mapping-commons.github.io/sssom/mapping-predicates/#the-3-step-process-for-selecting-an-appropriate-mapping-predicate should be non-controversial.

Fuzzy examples of correspondence

Here, a bunch of people will already start saying "nope, that ain't a valid mapping predicate you Anarchist". But I thin about half of the people would except these.

:BRCA1 :codesProtein :BRCA1Protein
:GlucoseLevelMeasurement :usedToQuantify :GlucoseLevelTrait
:Glucose :isMeasuredBy :GlucoseLevelMeasurement

~~- :Apple :definingProductOf :AppleTree~~ (removed as per suggestion of @joeflack4 below, @matentzn agrees).

What these have in common is that they are isomorphic (1:1) and highlight a different aspect of the thing.

:A rdfs:subClassOf :B

This sounds like a clear mapping candidate and I am happy to classify it as such, but there is a bit of a case to be made that

:BRCA1 rdfs:subClassOf BFO:entity

is not a real mapping per se. Still I think it is just about in scope, as it is very analogous to skos:broadMatch which is clearly in scope.

Clear examples of not-correspondences

:BRCA1 :in_taxon :NCBI:Human (regular association)
:Student :enrolledIn :University
:Phone :isManufacturedIn :China

Fuzzy examples of not-correspondences

:Hand :part_of :Body (mereological relations are clearly not "correspondence" predicates, but sometimes part of is a useful relation to integrate two semantic spaces)
:BRCA1 biolink:category biolink:Gene (this looks like a very general "subclass" assertion but this is more like a metaclass assertion, which I would not include in the set of correspondences.
:n-Hexane :isIsomerOf :Hexane(C6H14)

Alright, this is my first take. Feel free to weigh in, its a complicated discussion but ultimately useful to avoid people building ontologies in SSSOM format.

Answer 2 · 2023-10-19T22:37:25.000Z

:A rdfs:subClassOf :B

This sounds like a clear mapping candidate and I am happy to classify it as such, but there is a bit of a case to be made that

:BRCA1 rdfs:subClassOf BFO:entity

is not a real mapping per se. Still I think it is just about in scope, as it is very analogous to skos:broadMatch which is clearly in scope.

On that, I would like to state that, in my opinion, using rdfs:subClassOf or owl:equivalentClass as mapping predicates is something that should not be done, and that in fact the SSSOM standard probably should not have allowed. It’s too late to change that, but the use of those predicates can still be discouraged.

For two reasons:

First, it weakens the distinction between a “correspondence” and an “association”, or between the SSSOM format and an ontology serialisation format. It’s more difficult to argue that the SSSOM format should not be used to represent an ontology when the SSSOM format explicitly allows to use ontological relations as mapping predicates. This very discussion is a proof of that.

Second, I believe that it is not up to the author of a mapping to decide what the OWL or RDF representation of that mapping should be, or put differently that a mapping should be distinct from its possible OWL or RDF representation(s). Using rdfs:subClassOf as a mapping predicate conflates the two things, and I think it is an error. The mapping predicate should be something like skos:broadMatch instead, and that’s all the author of the mapping should have to say. How to translate that skos:broadMatch mapping into OWL or RDF should be out of scope and left for the consumers of the mapping to decide (and different consumers may need different translations).

(Yes, it is still possible, when a mapping uses rdfs:subClassOf, for the consumers to “overrule” the author-mandated representation and to transform the predicate into whatever other representation is adapted for their purpose. But it would be best if they didn’t have to do that – if the mapping did not have a predicate that dictates its OWL or RDF representation in the first place.)

I am inclined to think that oboInOwl:hasDbXref should also be avoided as a mapping predicate, for the same reason: it’s not actually a mapping predicate, it’s an annotation that can be used to represent a mapping in a OWL ontology (and that is, in effect, widely used for that purpose) – again, that’s conflating the correspondence with one way of translating it into OWL.

Answer 3 · 2023-10-20T11:18:18.000Z

@gouttegd thanks for this great analysis and detailed comment. Intellectually I agree with all that you say, but I suspect that other systems/representations will want to be reflected (thinking of OMOP) and these systems use hasChild / hasParent relationships between concepts from different terminologies (that look a lot like subClassOf), in addition to something like mapsTo. Practically it will be hard to draw a strong line.

But I agree with the sentiment that we should amend the documentation to:

Prefer proper semantic mapping relations from SEMAPV in SSSOM mappings (which excludes subclass and equivalentclass).
Avoid other mapping relationships for the reason you are saying.

Answer 4 · 2023-10-23T23:46:03.000Z

I think you guys have this covered. What I add is only a tangent.

@matentzn I don't really agree w/ your "fuzzy correspondence" examples.
e.g. :Apple :definingProductOf :AppleTree, all the examples share a word between the subject and object. All subs/objs in those examples share context, but they don't share any essential properties. Some of them only share more distant properties like "thing" or "made of matter" or "made of amino acids" that are not really essential to the definition of the thing.

Anyway you don't think they map/correspond to each other either. There's probably room for better language to describe these kinds of relationships/associations.

Answer 5 · 2023-10-24T00:13:59.000Z

Late to the discussion here. These distinctions are clearly very difficult to make, especially with things like "gene encodes protein" (which fyi is not 1:1 @matentzn). It's not a formal equivalence at all but it's definitely a "mapping" in common usage.

HOWEVER I don't think as SSSOM developers we need to lose too much sleep over this. You could dump an ontology into an SSSOM file using it as a TSV based triple serialization where all predicates are assumed to be "mappings" - but then the SSSOM tooling gives you nothing over what existing RDF/OWL tools provide, so why would you?

I'm sure there are lots of examples of tools I can use for the wrong job. Like I could use MS word to make a crude spreadsheet with tables... or I could use Excel. The MS word developers don't worry that I might possibly use it to build a spreadsheet.

I guess what I'm trying to say is whether or not some predicate is a "mapping" comes down to whether treating it as a mapping will allow you to extract further value from your data using the SSSOM toolkit.

EDIT: though of course we will need to make these calls when publishing SSSOM datasets, and in the case of OXO, etc. -- I just don't think it's an issue with the core SSSOM spec itself. In my view a mapping is whatever you, for your use case, think a mapping is!

Answer 6 · 2023-10-24T09:17:50.000Z

@udp thanks for your perspective.

I agree we should not lose sleep. And I also agree we do not need to be normative here.. But part of SSSOM is sociological, i.e. an effort that aims at making "things" better, and for that, we should at least have a rough idea on what "thing" we are talking about. So when people ask "what is a mapping", we should have some at least reasonable approximation of an answer to avoid it being misused. Also, in my experience it increased trust by stakeholders if we try to push the community towards less scope creep, so that eventually, the presence of a SSSOM file will be perceived as a work of true artisanship rather than a dump of arbitrary associations.

@joeflack4

All subs/objs in those examples share context, but they don't share any essential properties. Some of them only share more distant properties like "thing" or "made of matter" or "made of amino acids" that are not really essential to the definition of the thing.

You are of course correct! Apple and Apple tree is a bad example. However, :GlucoseLevelMeasurement :usedToQuantify :GlucoseLevelTrait makes some sense in many use case scenarios. Maybe a better way to define this "fuzzy" category would be to define the idea of "aspect" and postulate that the mapped entities are essentially aspects of the exact same thing (Student the role (Role ontology) and Student the person attending university (university ontology)). I removed the AppleTree example now as per your feedback, as it is indeed a dumb one.