HajoRijgersberg/OM

Range of hasNumericalValue

Opened this issue · 20 comments

I am looking at your README with the UML diagram for OM-2, and at the same time, I have the OM2 ontology open in Protege.

When I look at the UML diagram, I see that the range of hasNumericalValue is illustrated as being of type xml:float. However, when I look at the ontology in Protege, I don't see any range constraint on hasNumericalValue, beyond it being a datatype property.

Below this, in the README, I see this:

ex:_10Centimetres rdf:type om:Measure ;
  om:hasNumericalValue "10"^^xsd:double ;
  om:hasUnit om:centimeter .

There are a number of problems with this that come from the fact that OWL's numeric datatypes (in my opinion) are bonkers (this is a technical term from logic 😉):

  1. If we were to add a range constraint DataPropertyRange( om:hasNumericalValue xsd:float ), then the OM ontology plus ex:_10Centimetres would be inconsistent, because xsd:float and xsd:double are disjoint types. See Datatype Maps in the OWL2 specs.
  2. Even worse, there is no numeric type that brings together the types xsd:float, xsd:double, and owl:real. This means that it is very easy to render an ontology inconsistent by, for example, supplying an integer value for a property that is typed as xsd:float or xsd:double. Or, for that matter, supplying one of these two float types as the value of a property that has any type that is owl:real or any subtype thereof.

Someone I discussed this with says that it's a common standard practice to represent whole numbers using xsd:integer or a subtype thereof, and any quantity that might not be a whole number as an xsd:decimal. This avoids the kind of issues I describe above.

There doesn't seem to be anything consistent with the OWL spec that would permit normal uses of xsd:float or xsd:double -- any use of these types seems likely to cause inconsistencies.

I conjecture that this is a bad consequence of adopting the XSD data types -- which are really syntactic into a logic that is meant to be semantic.

Thanx for your issue! :) I fully agree with everything you say. For the reasons that you mention, the range of om:hasNumericalValue has not been specified in OM. So that's an error in the particular diagram. I have added a correctional note in the caption of the figure now.
So, I can be short here!
Do my statements here "solve" the problem for now? Please let me know! :)

Your statements clarify the situation, yes, and we can probably close this issue.

But it's clear that users need guidance when they supply values or apply constraints on measures. If they assume that normal numerical conversions will apply, then very bad things will happen.

I don't have very good solutions to offer:

  1. Make the suggestion that I refer to above: encourage users to always supply either an xsd:decimal or an xsd:integer everywhere. Constrain hasNumericalValue to have a range of xsd:decimal. This seems as good a fix as possible, but potentially breaks uses of OM.
  2. We should get OWL reasoners that do something more sensible with numbers! Even better, the OWL spec should be repaired where numerical data types are concerned.

Looking more closely into xsd:decimal, I'm concerned that constraining to xsd:decimal would create additional problems.
I notice two key issues in the definition of xsd:decimal:

  1. Infinity, -Infinity, and NaN are not included
  2. Precision cannot be specified, i.e., 2 = 2.0 = 2.00 arguably a feature in the context of OWL, but not measurement

Lack of precision isn't any worse than using an xsd:float or xsd:double. Lack of the infinities and NaN is a major issue, however, since some instruments return those values. Thus, if I am using OM to encode information about the results of a physical measurement operation, xsd:decimal will prevent me from recording what the instrument returned and force the use of a different representation.

There does appear to be an XSD datatype that was specifically designed to address these issues, xsd:precisionDecimal, but it did not make it into XML Schema 1.1 and as a consequence is not supported by libraries.

@jakebeal Thanks for pointing this out.

@HajoRijgersberg — Jake and I have been working together on formalizing some concepts on Synthetic Biology, particularly Experimental Protocols and the Containers they use.

Jake’s comment shows a strong distinction between ideal measures (the depth of a multiple-well microplate) and concrete measurements (fluorescence intensity as measured at a particular time by a particular piece of lab equipment), and likely we (or at least I) need to be more careful about this distinction and how it is properly represented in OM-2. I will do some more homework….

Aside to Jake — even when we think about measurements as in your very good points above, probably the way OWL works means that where you have a measurement, and know only that it is going to come as some form of floating point number, the proper type would be the union of xsd:float and xsd:double and … at the moment, I have no idea if it’s possible for OWL to do anything sensible with that. Again, more homework is required.

@HajoRijgersberg Sorry to be ignorant, but could you point me in the right direction to investigate the distinction between ideal measures and concrete measurements? There is likely a standard description of this distinction, but I don’t know the vocabulary and am unable to easily search for relevant literature. I was overoptimistic about my ability to do this homework.

My guess is that this terminology will be easy for you to specify. If not, please just close this.

Thanks!

@rpgoldman When thinking about measurements in biology, I am not typically concerned about the limited precision of an xsd:float, since that gives us seven decimal places of precision, and that's far more than any typical biolab instruments are able to give us. My baseline assumption for approach would then be one of the following two:

  1. Convert everything into floats, or
  2. Pull at least the numerical aspects of reasoning out of OWL as it's not really suited for them anyway.

in either case leaving any requirement for additional precision as an exception case to be handled in the future.

Right for measurements made by real instruments, floats are appropriate. But for specifications, like “the container’s volume must be at least 200 microliters,” the need to specify precision is really inappropriate, since we aren’t talking about an actually performed measurement. So I think for that side of things, decimal is the right approach, and float is inappropriate.

@jakebeal perhaps you are trying to jam too much semantics into a single token.
If you need say something about precision then say it explicitly - i.e. model it in your schema (ontology) and don't rely on some implication from the numeric encoding.

@rpgoldman , @dr-shorthair : there are two senses of the word "precision" that are getting conflated here.

  • Precision, in the sense of the limits on an estimate produced by an experiment, is indeed outside of the scope of what is required here, as I have noted above.
  • Precision, in the sense of being able to say as many digits of a number as desired, is the sense that I was referring to, since xsd:float and xsd:double have limits on the number of digits that can be expressed while xsd:decimal does not.

My primary concern here is to make sure that I am allowed to continue using xsd:float to indicate values in OM, since I sometimes need to indicate infinities or NaN as the measurement value returned by an instrument, which is not possible with xsd:decimal.

Beyond that, given OWL's failure to allow comparison between xsd:float and xsd:decimal, my current belief is that many of my uses of measurement will need their comparisons to be handled by some system other than OWL.

RDFS and OWL are designed for logic, to support reasoning, not arithmetic computation. Numerics are not central, hence that was outsourced to XSD (and inherits flaws from there).

But I repeat my point: it looks to me that you are overloading a datatype with information that should be explicitly modeled in the application.

I can’t speak for Jake, but if one wants to be able to talk about the meaning of experimental measurements, one needs to be able to represent both the meaning and the measurements themselves, and to do that, one must be able to take into account the limitations of the measurements.

The decision to treat numerics as a side matter seems like a poor choice for a language to be used to talk about the scientific process as it is actually practiced.

Thanx for the very interesting points, comments, and ideas! :) All very good points, and unfortunately I do not know the solution as well to the problem... Hope this discussion will lead to a proper solution!

Sorry to be ignorant, but could you point me in the right direction to investigate the distinction between ideal measures and concrete measurements? There is likely a standard description of this distinction, but I don’t know the vocabulary and am unable to easily search for relevant literature. I was overoptimistic about my ability to do this homework.
My guess is that this terminology will be easy for you to specify. If not, please just close this.

One would suspect it would indeed be easy. But unfortunately that does not appear to be the case...
We have a separate issue about this on this GitHub: #52.
Al gives a nice list of interesting references there. It would be great if you people could also contribute to that issue! :)

My primary concern here is to make sure that I am allowed to continue using xsd:float to indicate values in OM, since I sometimes need to indicate infinities or NaN as the measurement value returned by an instrument, which is not possible with xsd:decimal.

Since the specification of the range is left open in OM, I would say you are allowed to do that. Hope I'm not overlooking something?

Beyond that, given OWL's failure to allow comparison between xsd:float and xsd:decimal, my current belief is that many of my uses of measurement will need their comparisons to be handled by some system other than OWL.

I'm afraid so... :/

But I repeat my point: it looks to me that you are overloading a datatype with information that should be explicitly modeled in the application.

Not sure for precision in the sense of number of digits... That does seem to be related to number formats, doesn't it?

if one wants to be able to talk about the meaning of experimental measurements, one needs to be able to represent both the meaning and the measurements themselves, and to do that, one must be able to take into account the limitations of the measurements.

That seems reasonable.

The decision to treat numerics as a side matter seems like a poor choice for a language to be used to talk about the scientific process as it is actually practiced.

I agree with that... :/

Sorry to be ignorant, but could you point me in the right direction to investigate the distinction between ideal measures and concrete measurements? There is likely a standard description of this distinction, but I don’t know the vocabulary and am unable to easily search for relevant literature. I was overoptimistic about my ability to do this homework.
My guess is that this terminology will be easy for you to specify. If not, please just close this.

One would suspect it would indeed be easy. But unfortunately that does not appear to be the case... We have a separate issue about this on this GitHub: #52. Al gives a nice list of interesting references there. It would be great if you people could also contribute to that issue! :)

I’m not sure what belongs there and what here, so I will press on here for now. I am primarily interested in distinguishing between abstract Measures (using the OM-2 terminology), and the concrete Measures that @jakebeal has been talking about. The former seem to correspond to the notion of “quantity value” in the VIM terminology, and the latter to VIM’s “measurement.” Either could be associated with a “tolerance” (“the length must be 10cm +/- 1cm”), but the former does not have a notion of uncertainty. The former could have a notion of precision, but does not have to, at least for practical purposes. For most scientific purposes, the latter must have a notion of precision but not necessarily one that is expressed explicitly.

A computed estimate could have a notion of uncertainty, but I’m not sure how that would fit into metrology, if at all. Does it count as a measurement or not?

One terminological issue is that VIM does not seem to have a term for the concept of (quantity value - measurement) — i.e., a quantity value that is not a measurement — unless I have missed something.

I’m not sure OWL is helpful in reasoning about precision. We get some notion of precision “for free” using IEEE floats, for devices that use those internally, but beyond that, the waters get murky again.

As a general principle, using xsd:decimal and its xsd:integer subtype for an abstract Measure and something more precise, including possibly floats, for a measurement, makes sense.

What is really a nuisance is that OWL does not seem to be able to express constraints in which a measurement, expressed in a float type, is compared to a reference value expressed in xsd:decimal.

Another issue is that I don’t believe we can even mention something like a transcendental number, except by using an approximate value. On the one hand, that reflects what we can do in first order logic, but on the other hand it needlessly prevents us from using such values in an incomplete, but useful, way. E.g., talking about an angular value that is greater than pi.

I’m not sure what belongs there and what here, so I will press on here for now. I am primarily interested in distinguishing between abstract Measures (using the OM-2 terminology), and the concrete Measures that @jakebeal has been talking about.

I think these subjects are covered/discussed there. But no problem if you stay here! :)

The former seem to correspond to the notion of “quantity value” in the VIM terminology, and the latter to VIM’s “measurement.” Either could be associated with a “tolerance” (“the length must be 10cm +/- 1cm”), but the former does not have a notion of uncertainty. The former could have a notion of precision, but does not have to, at least for practical purposes. For most scientific purposes, the latter must have a notion of precision but not necessarily one that is expressed explicitly.

It seems that both could merge to the same concept 'measure', and I think in principle they do. Different aspects such as tolerance or min-max, std, etc, could be indicated if desired.
I do think it all relates or matches with VIM's quantity value.

A computed estimate could have a notion of uncertainty, but I’m not sure how that would fit into metrology, if at all. Does it count as a measurement or not?

I think it does. Actually, I do not see a reason why it wouldn't. I think distinctions between such concepts would be understandable but artificial. Better don't, it will make it complicated with drawbacks, such as the xsd types.

One terminological issue is that VIM does not seem to have a term for the concept of (quantity value - measurement) — i.e., a quantity value that is not a measurement — unless I have missed something.

VIM, other standards, and unit ontologies aim to only express a combination of numbers and units, I think. The origin of a statement, whether it be a measurement or a calculation or a hypothesis is not part of it. That would be very interesting though from an epistemological point of view. I have tried something in that direction in my thesis. But it's a long way to go. I'm still interested in that topic!

I’m not sure OWL is helpful in reasoning about precision. We get some notion of precision “for free” using IEEE floats, for devices that use those internally, but beyond that, the waters get murky again.

Agreed.

As a general principle, using xsd:decimal and its xsd:integer subtype for an abstract Measure and something more precise, including possibly floats, for a measurement, makes sense.

Yes, one is allowed to do that. I can not restrict the om:hasValue property to these types of course, because of all above-discussed issues.

What is really a nuisance is that OWL does not seem to be able to express constraints in which a measurement, expressed in a float type, is compared to a reference value expressed in xsd:decimal.

Agreed.

Another issue is that I don’t believe we can even mention something like a transcendental number, except by using an approximate value. On the one hand, that reflects what we can do in first order logic, but on the other hand it needlessly prevents us from using such values in an incomplete, but useful, way. E.g., talking about an angular value that is greater than pi.

Do you mean complex numbers? We got a topic about that too: #51.

Another issue is that I don’t believe we can even mention something like a transcendental number, except by using an approximate value. On the one hand, that reflects what we can do in first order logic, but on the other hand it needlessly prevents us from using such values in an incomplete, but useful, way. E.g., talking about an angular value that is greater than pi.

Do you mean complex numbers? We got a topic about that too: #51.

No, actually I meant the real numbers that are not rational. xsd:decimal is only sufficient to represent the rationals. There's owl:real, but

The owl:real datatype does not directly provide any lexical forms.

I'm not sure how one refers to a non-rational real since there are no lexical forms for it, and they aren't named individuals so I can't say :pi a owl:real.

I would think that there would be some discussion of this, but I haven't been able to find it.

Clear, thanx.
Very relevant topic! Surprising there's no discussion about this. We can start an issue on this GitHub?
Or we could broaden the 'complex number' issue?

It is up to you. I think on the one hand there's a relatively simple expedient for handling complex numbers (at least complex numbers over the rationals), by treating them as tuples (although then, of course, one needs a theory for working with this representation if one is to do anything useful).

I'm less certain how one deals with non-rational numbers, since the logic shouldn't bake in any assumptions about approximating them. This seems to require an external theory for working with them.

So... I leave it with you to decide whether to fold this in with the other issue or not.

I have thought about it. This all seems to fit under the subject 'lists of numbers' (or 'sets of numbers' - in Dutch we say 'getalverzamelingen'; it's unclear to me how it exactly translates to English). The complex numbers are special, since they are indeed represented by a tuple. But on the other hand - I agree with you - real numbers should in fact too be defined as objects with multiple operands, i.e., as a fraction, with a numerator and a denominator - also a kind of a tuple (in a way). Indeed, approximating real numbers does not seem sensible. So perhaps we should broaden the already existing issue 'Complex numbers' on this GitHub to 'Lists of numbers'. Does that seem sensible?

I think it's a general problem of "semantics of numbers." For some reason, the OWL designers chose to represent numbers in a way that is fundamentally syntactic instead of semantic. We can only talk about digital encodings of numbers, not numbers in the abstract, as far as I can tell. Numbers encoded as floats are semantically different from the rationals, and numbers encoded as single precision floats are different from double precision floats. That's accurate, but it doesn't let us talk sensibly about denotations.

I suppose that permits some forms of sound and relatively efficient reasoning, but only at the cost of limiting us to the "Only Somewhat Semantic Web." Something incomplete but more expressive would be nice to have.

I fully agree with you. And it confronts us with a great problem, from an optimistic perspective a challenge, but I think it's huge.
Anyway, hope we can contribute through this GitHub in this quite fundamental issue.
Regarding tuples, Maxim van den Wynckel has just opened a new issue on this GitHub: #63. Maybe this is the issue I suggested above.
To be continued, all!