Front page of SSSOM website desperately needs to list what the columns in an SSSOM file are/can be
cthoyt opened this issue · 5 comments
it's near impossible to find the list of all of the fields in SSSOM. This should be the third thing on the https://mapping-commons.github.io/sssom/ after 1. an explanation of what SSSOM is/what it's for and 2. an example SSSOM file
As a potential SSSOM user, I don't ever ever ever want to look at LinkML-like documentation. It is confusing and makes my brain hurt because I am not a LinkML developer.
What I really need is a list of the fields, a description of what's goes in them, and an example for each. All of this needs to be in a table and on the front page after the two other things
You mean something like this?
If not, provide me with an example page and I will make it happen!
Once we settle on how it will look like, we can add more examples to the actual schema.
Draft 1 of a simple overview of allowed slots
Mapping Set
Represents a set of mappings
Slot Name | Description | Value | Examples |
---|---|---|---|
curie_map | A dictionary that contains prefixes as keys and their URI expansions as values. | prefix | N/A |
mappings | Contains a list of mapping objects | mapping | N/A |
mapping_set_id | A globally unique identifier for the mapping set (not each individual mapping). Should be IRI, ideally resolvable. | uri | http://purl.obolibrary.org/obo/mondo/mappings/mondo_exactmatch_ncit.sssom.tsv |
mapping_set_version | A version string for the mapping. | string | 2020-01-01, 1.2.1 |
mapping_set_source | A mapping set or set of mapping set that was used to derive the mapping set. | uri | http://purl.obolibrary.org/obo/mondo/mappings/2022-05-20/mondo_exactmatch_ncit.sssom.tsv |
mapping_set_title | The display name of a mapping set. | string | The Mondo-OMIM mappings by Monarch Initiative. |
mapping_set_description | A description of the mapping set. | string | This mapping set was produced to integrate human and mouse phenotype data at the IMPC. It is primarily used for making mouse phenotypes searchable by human synonyms at https://mousephenotype.org/. |
creator_id | Identifies the persons or groups responsible for the creation of the mapping. The creator is the agent that put the mapping in its published form, which may be different from the author, which is a person that was actively involved in the assertion of the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. | EntityReference | N/A |
creator_label | A string identifying the creator of this mapping. In the spirit of provenance, consider using creator_id instead. | string | N/A |
license | A url to the license of the mapping. In absence of a license we assume no license. | uri | N/A |
subject_type | The type of entity that is being mapped. | owl class, owl object property, owl data property, owl annotation property, owl named individual, skos concept, rdfs resource, rdfs class, rdfs literal, rdfs datatype, rdf property | owl:Class |
subject_source | URI of vocabulary or identifier source for the subject. | EntityReference | obo:mondo.owl, wikidata:Q7876491 |
subject_source_version | Version IRI or version string of the source of the subject term. | string | http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl |
object_type | The type of entity that is being mapped. | owl class, owl object property, owl data property, owl annotation property, owl named individual, skos concept, rdfs resource, rdfs class, rdfs literal, rdfs datatype, rdf property | owl:Class |
object_source | URI of vocabulary or identifier source for the object. | EntityReference | obo:mondo.owl, wikidata:Q7876491 |
object_source_version | Version IRI or version string of the source of the object term. | string | http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl |
mapping_provider | URL pointing to the source that provided the mapping, for example an ontology that already contains the mappings, or a database from which it was derived. | uri | N/A |
mapping_tool | A reference to the tool or algorithm that was used to generate the mapping. Should be a URL pointing to more info about it, but can be free text. | string | https://github.com/AgreementMakerLight/AML-Project |
mapping_tool_version | Version string that denotes the version of the mapping tool used. | string | v3.2 |
mapping_date | The date the mapping was asserted. This is different from the date the mapping was published or compiled in a SSSOM file. | date | N/A |
publication_date | The date the mapping was published. This is different from the date the mapping was asserted. | date | N/A |
subject_match_field | A list of properties (term annotations on the subject) that was used for the match. | EntityReference | N/A |
object_match_field | A list of properties (term annotations on the object) that was used for the match. | EntityReference | N/A |
subject_preprocessing | Method of preprocessing applied to the fields of the subject. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | EntityReference | semapv:Stemming, semapv:StopWordRemoval |
object_preprocessing | Method of preprocessing applied to the fields of the object. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | EntityReference | semapv:Stemming, semapv:StopWordRemoval |
see_also | A URL specific for the mapping instance. E.g. for kboom we have a per-mapping image that shows surrounding axioms that drive probability. Could also be a github issue URL that discussed a complicated alignment | string | N/A |
issue_tracker | A URL location of the issue tracker for this entity. | uri | https://github.com/mapping-commons/mh_mapping_initiative/issues |
other | Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. | string | N/A |
comment | Free text field containing either curator notes or text generated by tool providing additional informative information. | string | N/A |
extension_definitions | A list that defines the extension slots used in the mapping set. | extension definition | N/A |
Mapping
Represents an individual mapping between a pair of entities
Slot Name | Description | Value | Examples |
---|---|---|---|
subject_id | The ID of the subject of the mapping. | EntityReference | HP:0009894 |
subject_label | The label of subject of the mapping | string | Thickened ears |
subject_category | The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. | string | UBERON:0001062, anatomical entity, biolink:Gene |
predicate_id | The ID of the predicate or relation that relates the subject and object of this match. | EntityReference | owl:sameAs, owl:equivalentClass, owl:equivalentProperty, rdfs:subClassOf, rdfs:subPropertyOf, skos:relatedMatch, skos:closeMatch, skos:exactMatch, skos:narrowMatch, skos:broadMatch, oboInOwl:hasDbXref, rdfs:seeAlso |
predicate_label | The label of the predicate/relation of the mapping | string | has cross-reference |
predicate_modifier | A modifier for negating the predicate. See #40 for discussion | Not | Not |
object_id | The ID of the object of the mapping. | EntityReference | HP:0009894 |
object_label | The label of object of the mapping | string | Thickened ears |
object_category | The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. | string | UBERON:0001062, anatomical entity, biolink:Gene |
mapping_justification | A mapping justification is an action (or the written representation of that action) of showing a mapping to be right or reasonable. | EntityReference | semapv:LexicalMatching, semapv:ManualMappingCuration |
author_id | Identifies the persons or groups responsible for asserting the mappings. Recommended to be a list of ORCIDs or otherwise identifying URIs. | EntityReference | N/A |
author_label | A string identifying the author of this mapping. In the spirit of provenance, consider using author_id instead. | string | N/A |
reviewer_id | Identifies the persons or groups that reviewed and confirmed the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. | EntityReference | N/A |
reviewer_label | A string identifying the reviewer of this mapping. In the spirit of provenance, consider using reviewer_id instead. | string | N/A |
creator_id | Identifies the persons or groups responsible for the creation of the mapping. The creator is the agent that put the mapping in its published form, which may be different from the author, which is a person that was actively involved in the assertion of the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. | EntityReference | N/A |
creator_label | A string identifying the creator of this mapping. In the spirit of provenance, consider using creator_id instead. | string | N/A |
license | A url to the license of the mapping. In absence of a license we assume no license. | uri | N/A |
subject_type | The type of entity that is being mapped. | owl class, owl object property, owl data property, owl annotation property, owl named individual, skos concept, rdfs resource, rdfs class, rdfs literal, rdfs datatype, rdf property | owl:Class |
subject_source | URI of vocabulary or identifier source for the subject. | EntityReference | obo:mondo.owl, wikidata:Q7876491 |
subject_source_version | Version IRI or version string of the source of the subject term. | string | http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl |
object_type | The type of entity that is being mapped. | owl class, owl object property, owl data property, owl annotation property, owl named individual, skos concept, rdfs resource, rdfs class, rdfs literal, rdfs datatype, rdf property | owl:Class |
object_source | URI of vocabulary or identifier source for the object. | EntityReference | obo:mondo.owl, wikidata:Q7876491 |
object_source_version | Version IRI or version string of the source of the object term. | string | http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl |
mapping_provider | URL pointing to the source that provided the mapping, for example an ontology that already contains the mappings, or a database from which it was derived. | uri | N/A |
mapping_source | The mapping set this mapping was originally defined in. mapping_source is used for example when merging multiple mapping sets or deriving one mapping set from another. | EntityReference | MONDO_MAPPINGS:mondo_exactmatch_ncit.sssom.tsv |
mapping_cardinality | A string indicating whether this mapping is from a 1:1 (the subject_id maps to a single object_id), 1:n (the subject maps to more than one object_id), n:1, 1:0, 0:1 or n:n group. Note that this is a convenience field that should be derivable from the mapping set. | 1:1, 1:n, n:1, 1:0, 0:1, n:n | N/A |
mapping_tool | A reference to the tool or algorithm that was used to generate the mapping. Should be a URL pointing to more info about it, but can be free text. | string | https://github.com/AgreementMakerLight/AML-Project |
mapping_tool_version | Version string that denotes the version of the mapping tool used. | string | v3.2 |
mapping_date | The date the mapping was asserted. This is different from the date the mapping was published or compiled in a SSSOM file. | date | N/A |
publication_date | The date the mapping was published. This is different from the date the mapping was asserted. | date | N/A |
confidence | A score between 0 and 1 to denote the confidence or probability that the match is correct, where 1 denotes total confidence. | double | N/A |
curation_rule | A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule is captured as a resource rather than a string, which enables higher levels of transparency and sharing across mapping sets. The URI representation of the curation rule is expected to be a resolvable identifier which provides details about the nature of the curation rule. | EntityReference | N/A |
curation_rule_text | A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule should be captured as a resource (entity reference) rather than a string (see curation_rule element), which enables higher levels of transparency and sharing across mapping sets. The textual representation of curation rule is intended to be used in cases where (1) the creation of a resource is not practical from the perspective of the mapping_provider and (2) as an additional piece of metadata to augment the curation_rule element with a human readable text. | string | N/A |
subject_match_field | A list of properties (term annotations on the subject) that was used for the match. | EntityReference | N/A |
object_match_field | A list of properties (term annotations on the object) that was used for the match. | EntityReference | N/A |
match_string | String that is shared by subj/obj. It is recommended to indicate the fields for the match using the object and subject_match_field slots. | string | N/A |
subject_preprocessing | Method of preprocessing applied to the fields of the subject. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | EntityReference | semapv:Stemming, semapv:StopWordRemoval |
object_preprocessing | Method of preprocessing applied to the fields of the object. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | EntityReference | semapv:Stemming, semapv:StopWordRemoval |
similarity_score | A score between 0 and 1 to denote the similarity between two entities, where 1 denotes equivalence, and 0 denotes disjointness. The score is meant to be used in conjunction with the similarity_measure field, to document, for example, the lexical or semantic match of a matching algorithm. | double | N/A |
similarity_measure | The measure used for computing a similarity score. This field is meant to be used in conjunction with the similarity_score field, to document, for example, the lexical or semantic match of a matching algorithm. To make processing this field as unambiguous as possible, we recommend using wikidata CURIEs, but the type of this field is deliberately unspecified. | string | https://www.wikidata.org/entity/Q865360, wikidata:Q865360, Levenshtein distance |
see_also | A URL specific for the mapping instance. E.g. for kboom we have a per-mapping image that shows surrounding axioms that drive probability. Could also be a github issue URL that discussed a complicated alignment | string | N/A |
issue_tracker_item | The issue tracker item discussing this mapping. | EntityReference | SSSOM_GITHUB_ISSUE:166 |
other | Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. | string | N/A |
comment | Free text field containing either curator notes or text generated by tool providing additional informative information. | string | N/A |
Mapping Registry
A registry for managing mapping sets. It holds a set of mapping set references, and can import other registries.
Slot Name | Description | Value | Examples |
---|---|---|---|
mapping_registry_id | The unique identifier of a mapping registry. | EntityReference | N/A |
mapping_registry_title | The title of a mapping registry. | string | N/A |
mapping_registry_description | The description of a mapping registry. | string | N/A |
imports | A list of registries that should be imported into this one. | uri | N/A |
mapping_set_references | A list of mapping set references. | mapping set reference | N/A |
documentation | A URL to the documentation of this mapping commons. | uri | N/A |
homepage | A URL to a homepage of this mapping commons. | uri | N/A |
issue_tracker | A URL location of the issue tracker for this entity. | uri | https://github.com/mapping-commons/mh_mapping_initiative/issues |
Mapping Set Reference
A reference to a mapping set. It allows to augment mapping set metadata from the perspective of the registry, for example, providing confidence, or a local filename or a grouping.
Slot Name | Description | Value | Examples |
---|---|---|---|
mapping_set_id | A globally unique identifier for the mapping set (not each individual mapping). Should be IRI, ideally resolvable. | uri | http://purl.obolibrary.org/obo/mondo/mappings/mondo_exactmatch_ncit.sssom.tsv |
mirror_from | A URL location from which to obtain a resource, such as a mapping set. | uri | N/A |
registry_confidence | This value is set by the registry that indexes the mapping set. It reflects the confidence the registry has in the correctness of the mappings in the mapping set. | double | N/A |
mapping_set_group | Set by the owners of the mapping registry. A way to group . | string | N/A |
last_updated | The date this reference was last updated. | date | N/A |
local_name | The local name assigned to file that corresponds to the downloaded mapping set. | string | N/A |
Prefix
No description
Slot Name | Description | Value | Examples |
---|---|---|---|
prefix_name | No description | ncname | N/A |
prefix_url | No description | uri | N/A |
“Slot name” is LinkML jargon, should be “Field Name” for us mere mortals
What I need is to remember what the columns are in SSSOM file. So the only thing that should be there is the "Mappings" header and explicitly not the Mapping Sets header or anything else. And it should be "Column" instead of "Slot name" or "Field name"
Can we have another column in this table that is either "Required", "Recommended", or "Optional"? And sort the columns a bit better so I don't get some of the extraneous stuff towards the top of the list like "subject_category"
We need to rename EntityReference
to be CURIE
It's confusing with the example column to tell the difference between fields that take a list of things or a single thing. Maybe stick to a single example for each
Any column that has N/A for the example means we need to do some thinking! We need an example for everything
What I need is to remember what the columns are in SSSOM file. So the only thing that should be there is the "Mappings" header and explicitly not the Mapping Sets header or anything else. And it should be "Column" instead of "Slot name" or "Field name"
Its a quite personal preference IMO, but OK; There are already a few groups using SSSOM in rdf format so I want to be careful to completely superimpose Table terminology (column etc). See compromise below.
Can we have another column in this table that is either "Required", "Recommended", or "Optional"?
I did, but its quite a problematic column due to our latest changes to the model to accommodate for literal mappings. We probably need to hardcode the values for subject_id and object_id as these are no longer documented on schema level to be required (the requirement is encoded in the spec and a few rules in the data model) to accommodate for literal mappings.
And sort the columns a bit better so I don't get some of the extraneous stuff towards the top of the list like "subject_category"
Not 100% sure, as the order is the order in which they should appear in the SSSOM table as well... How strong do you feel about this?
We need to rename EntityReference to be CURIE
I did, but not straight, see example and reasoning above.
It's confusing with the example column to tell the difference between fields that take a list of things or a single thing. Maybe stick to a single example for each
OK.
Any column that has N/A for the example means we need to do some thinking! We need an example for everything
Agreed but that's a bit more involved, first lets get the page up we have right now.
V2
Column/Field | Description | Value | Examples | Required |
---|---|---|---|---|
subject_id | The ID of the subject of the mapping. | entity reference (e.g. CURIE in TSV) | HP:0009894 | Optional |
subject_label | The label of subject of the mapping | string | Thickened ears | Recommended |
subject_category | The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. | string | UBERON:0001062 | Optional |
predicate_id | The ID of the predicate or relation that relates the subject and object of this match. | entity reference (e.g. CURIE in TSV) | owl:sameAs | Required |
predicate_label | The label of the predicate/relation of the mapping | string | has cross-reference | Optional |
predicate_modifier | A modifier for negating the predicate. See #40 for discussion | Not | Not | Optional |
object_id | The ID of the object of the mapping. | entity reference (e.g. CURIE in TSV) | HP:0009894 | Optional |
object_label | The label of object of the mapping | string | Thickened ears | Recommended |
object_category | The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. | string | UBERON:0001062 | Optional |
mapping_justification | A mapping justification is an action (or the written representation of that action) of showing a mapping to be right or reasonable. | entity reference (e.g. CURIE in TSV) | semapv:LexicalMatching | Required |
author_id | Identifies the persons or groups responsible for asserting the mappings. Recommended to be a list of ORCIDs or otherwise identifying URIs. | entity reference (e.g. CURIE in TSV) | N/A | Optional |
author_label | A string identifying the author of this mapping. In the spirit of provenance, consider using author_id instead. | string | N/A | Optional |
reviewer_id | Identifies the persons or groups that reviewed and confirmed the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. | entity reference (e.g. CURIE in TSV) | N/A | Optional |
reviewer_label | A string identifying the reviewer of this mapping. In the spirit of provenance, consider using reviewer_id instead. | string | N/A | Optional |
creator_id | Identifies the persons or groups responsible for the creation of the mapping. The creator is the agent that put the mapping in its published form, which may be different from the author, which is a person that was actively involved in the assertion of the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. | entity reference (e.g. CURIE in TSV) | N/A | Optional |
creator_label | A string identifying the creator of this mapping. In the spirit of provenance, consider using creator_id instead. | string | N/A | Optional |
license | A url to the license of the mapping. In absence of a license we assume no license. | uri | N/A | Optional |
subject_type | The type of entity that is being mapped. | owl class, owl object property, owl data property, owl annotation property, owl named individual, skos concept, rdfs resource, rdfs class, rdfs literal, rdfs datatype, rdf property | owl:Class | Optional |
subject_source | URI of vocabulary or identifier source for the subject. | entity reference (e.g. CURIE in TSV) | obo:mondo.owl | Optional |
subject_source_version | Version IRI or version string of the source of the subject term. | string | http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl | Optional |
object_type | The type of entity that is being mapped. | owl class, owl object property, owl data property, owl annotation property, owl named individual, skos concept, rdfs resource, rdfs class, rdfs literal, rdfs datatype, rdf property | owl:Class | Optional |
object_source | URI of vocabulary or identifier source for the object. | entity reference (e.g. CURIE in TSV) | obo:mondo.owl | Optional |
object_source_version | Version IRI or version string of the source of the object term. | string | http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl | Optional |
mapping_provider | URL pointing to the source that provided the mapping, for example an ontology that already contains the mappings, or a database from which it was derived. | uri | N/A | Optional |
mapping_source | The mapping set this mapping was originally defined in. mapping_source is used for example when merging multiple mapping sets or deriving one mapping set from another. | entity reference (e.g. CURIE in TSV) | MONDO_MAPPINGS:mondo_exactmatch_ncit.sssom.tsv | Optional |
mapping_cardinality | A string indicating whether this mapping is from a 1:1 (the subject_id maps to a single object_id), 1:n (the subject maps to more than one object_id), n:1, 1:0, 0:1 or n:n group. Note that this is a convenience field that should be derivable from the mapping set. | 1:1, 1:n, n:1, 1:0, 0:1, n:n | N/A | Optional |
mapping_tool | A reference to the tool or algorithm that was used to generate the mapping. Should be a URL pointing to more info about it, but can be free text. | string | https://github.com/AgreementMakerLight/AML-Project | Optional |
mapping_tool_version | Version string that denotes the version of the mapping tool used. | string | v3.2 | Optional |
mapping_date | The date the mapping was asserted. This is different from the date the mapping was published or compiled in a SSSOM file. | date | N/A | Optional |
publication_date | The date the mapping was published. This is different from the date the mapping was asserted. | date | N/A | Optional |
confidence | A score between 0 and 1 to denote the confidence or probability that the match is correct, where 1 denotes total confidence. | double | N/A | Optional |
curation_rule | A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule is captured as a resource rather than a string, which enables higher levels of transparency and sharing across mapping sets. The URI representation of the curation rule is expected to be a resolvable identifier which provides details about the nature of the curation rule. | entity reference (e.g. CURIE in TSV) | N/A | Optional |
curation_rule_text | A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule should be captured as a resource (entity reference) rather than a string (see curation_rule element), which enables higher levels of transparency and sharing across mapping sets. The textual representation of curation rule is intended to be used in cases where (1) the creation of a resource is not practical from the perspective of the mapping_provider and (2) as an additional piece of metadata to augment the curation_rule element with a human readable text. | string | N/A | Optional |
subject_match_field | A list of properties (term annotations on the subject) that was used for the match. | entity reference (e.g. CURIE in TSV) | N/A | Optional |
object_match_field | A list of properties (term annotations on the object) that was used for the match. | entity reference (e.g. CURIE in TSV) | N/A | Optional |
match_string | String that is shared by subj/obj. It is recommended to indicate the fields for the match using the object and subject_match_field slots. | string | N/A | Optional |
subject_preprocessing | Method of preprocessing applied to the fields of the subject. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | entity reference (e.g. CURIE in TSV) | semapv:Stemming | Optional |
object_preprocessing | Method of preprocessing applied to the fields of the object. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | entity reference (e.g. CURIE in TSV) | semapv:Stemming | Optional |
similarity_score | A score between 0 and 1 to denote the similarity between two entities, where 1 denotes equivalence, and 0 denotes disjointness. The score is meant to be used in conjunction with the similarity_measure field, to document, for example, the lexical or semantic match of a matching algorithm. | double | N/A | Optional |
similarity_measure | The measure used for computing a similarity score. This field is meant to be used in conjunction with the similarity_score field, to document, for example, the lexical or semantic match of a matching algorithm. To make processing this field as unambiguous as possible, we recommend using wikidata CURIEs, but the type of this field is deliberately unspecified. | string | https://www.wikidata.org/entity/Q865360 | Optional |
see_also | A URL specific for the mapping instance. E.g. for kboom we have a per-mapping image that shows surrounding axioms that drive probability. Could also be a github issue URL that discussed a complicated alignment | string | N/A | Optional |
issue_tracker_item | The issue tracker item discussing this mapping. | entity reference (e.g. CURIE in TSV) | SSSOM_GITHUB_ISSUE:166 | Optional |
other | Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. | string | N/A | Optional |
comment | Free text field containing either curator notes or text generated by tool providing additional informative information. | string | N/A | Optional |
@cthoyt @jamesamcl Thanks so much for your help; its done:
https://mapping-commons.github.io/sssom/
Also, I have updated the LinkML pages while we are at it a bit along the same lines:
- https://mapping-commons.github.io/sssom/linkml-index/ (LinkML front page)
- Example page for a field
Both need work but I think they are already 150% better than before!