OBO serializer incorrectly quoting IRI property values
Closed this issue · 6 comments
The OBO serializer in owlapi 4.5.22 generates slightly incorrect property_value
tags, in that the value of the property is always quoted, as in the following example:
Per the specification of the OBO Flat File Format (in its 1.4 version), property values should be either unquoted IDs (fully expanded IRIs or CURIEs) or quoted literal values followed by a type specification:
property_value-Tag Relation-ID ( QuotedString XSD-Type | ID )
So, when the value of a property is an ID, it should not be quoted.
For reference, related ticket on the ODK repository: INCATools/ontology-development-kit#770
AFAICT, the cause for the systematic quoting behaviour is found in two places:
-
The
writePropertyValue
method in org.obolibrary.oboformat.writer.OBOFormatWriter. This method unconditionally writes the property value in double quotes, regardless of the type of the value. -
The
trGenericPropertyValue
method in org.obolibrary.obo2owl.OWLAPIOwl2Obo systematically converts the property value to a Java String (if the value was an IRI, it does that by calling thegetIdentifier
helper method, among other things to convert an IRI into a CURIE).
The result is that by the point we could decide whether the value needs quoting or not (in OBOFormatWriter.writePropertyValue
), the information that the value is an ID is lost – all we have is a java.lang.String
.
I can think of three ways of fixing that:
- We make
writePropertyValue
check what the value looks like (does it look like an IRI/a CURIE) and based on that decide whether to quote or not. - We somehow convey to
writePropertyValue
the information that the value is an ID and therefore should not be quoted (not sure about the best way to convey that bit). - We do the quoting (if necessary) in
trGenericPropertyValue
(where we have all the information we need) so thatwritePropertyValue
just has to write the value “as provided”, without worrying about quoting.
FWIW, I feel the first option would be an hideous hack and the third one would be somehow a violation of the separation of duties between OWLAPIOwl2Obo
and OBOFormatWriter
.
FWIW, rough proof-of-concept of a fix using option 2:
--- a/oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIOwl2Obo.java
+++ b/oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIOwl2Obo.java
@@ -1085,7 +1085,7 @@ public class OWLAPIOwl2Obo {
clause.addValue(propId);
if (annVal instanceof OWLLiteral) {
OWLLiteral owlLiteral = (OWLLiteral) annVal;
- clause.addValue(owlLiteral.getLiteral());
+ clause.addValue(new PropertyValue(owlLiteral.getLiteral()));
OWLDatatype datatype = owlLiteral.getDatatype();
IRI dataTypeIri = datatype.getIRI();
if (!OWL2Datatype.isBuiltIn(dataTypeIri)) {
@@ -1100,7 +1100,7 @@ public class OWLAPIOwl2Obo {
clause.addValue(dataTypeIri.toString());
}
} else if (annVal instanceof IRI) {
- clause.addValue(getIdentifier((IRI) annVal));
+ clause.addValue(new PropertyValue(getIdentifier((IRI) annVal), true));
}
frame.addClause(clause);
}
diff --git a/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java b/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java
new file mode 100644
index 000000000..131ac36c9
--- /dev/null
+++ b/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java
@@ -0,0 +1,25 @@
+package org.obolibrary.obo2owl;
+
+public class PropertyValue {
+
+ private String value;
+ private boolean valueIsId;
+
+ public PropertyValue(String val) {
+ value = val;
+ valueIsId = false;
+ }
+
+ public PropertyValue(String val, boolean isId) {
+ value = val;
+ valueIsId = isId;
+ }
+
+ public String toString() {
+ return value;
+ }
+
+ public boolean isId() {
+ return valueIsId;
+ }
+}
diff --git a/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java b/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
index 47120badb..bfa043d0c 100644
--- a/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
+++ b/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
@@ -25,6 +25,7 @@ import java.util.Set;
import javax.annotation.Nonnull;
import javax.annotation.Nullable;
+import org.obolibrary.obo2owl.PropertyValue;
import org.obolibrary.obo2owl.OWLAPIObo2Owl;
import org.obolibrary.oboformat.model.Clause;
import org.obolibrary.oboformat.model.Frame;
@@ -558,10 +559,15 @@ public class OBOFormatWriter {
if (it.hasNext()) {
// value
sb.append(' ');
- String val = it.next().toString(); // TODO replace toString() method
- sb.append('"');
- sb.append(escapeOboString(val, EscapeMode.quotes));
- sb.append('"');
+ PropertyValue val = (PropertyValue) it.next();
+ if ( val.isId() ) {
+ sb.append(escapeOboString(val.toString(), EscapeMode.simple));
+ }
+ else {
+ sb.append('"');
+ sb.append(escapeOboString(val.toString(), EscapeMode.quotes));
+ sb.append('"');
+ }
}
while (it.hasNext()) {
// optional type; there should be only one value left in the iterator
Thanks I'll take a look later today
PR created: #1085
In relation with the OBO Specification of the property_value
:
property_value
This tag binds a property to a value in this instance. The value of this tag is a relationship type id, a space, and a value specifier. The value specifier may have one of two forms; in the first form, it is just the id of some other instance, relationship type or term. In the second form, the value is given by a quoted string, a space, and datatype identifier.
IRI values are distinguished from the quoted string values using the heuristic approach applied in the OWLAPIObo2Owl
If Clause has 2 values then it is `relationship type id` + `IRI`.
If Clause has 3 values then it is `relationship type id` + `value in quoted strings` + `datatype identifier`
fixed in 4 and 5, to be fixed in 6