owlcs/owlapi

OBO serializer incorrectly quoting IRI property values

Closed this issue · 6 comments

The OBO serializer in owlapi 4.5.22 generates slightly incorrect property_value tags, in that the value of the property is always quoted, as in the following example:

Per the specification of the OBO Flat File Format (in its 1.4 version), property values should be either unquoted IDs (fully expanded IRIs or CURIEs) or quoted literal values followed by a type specification:

property_value-Tag Relation-ID ( QuotedString XSD-Type | ID )

So, when the value of a property is an ID, it should not be quoted.

For reference, related ticket on the ODK repository: INCATools/ontology-development-kit#770

AFAICT, the cause for the systematic quoting behaviour is found in two places:

  • The writePropertyValue method in org.obolibrary.oboformat.writer.OBOFormatWriter. This method unconditionally writes the property value in double quotes, regardless of the type of the value.

  • The trGenericPropertyValue method in org.obolibrary.obo2owl.OWLAPIOwl2Obo systematically converts the property value to a Java String (if the value was an IRI, it does that by calling the getIdentifier helper method, among other things to convert an IRI into a CURIE).

The result is that by the point we could decide whether the value needs quoting or not (in OBOFormatWriter.writePropertyValue), the information that the value is an ID is lost – all we have is a java.lang.String.

I can think of three ways of fixing that:

  • We make writePropertyValue check what the value looks like (does it look like an IRI/a CURIE) and based on that decide whether to quote or not.
  • We somehow convey to writePropertyValue the information that the value is an ID and therefore should not be quoted (not sure about the best way to convey that bit).
  • We do the quoting (if necessary) in trGenericPropertyValue (where we have all the information we need) so that writePropertyValue just has to write the value “as provided”, without worrying about quoting.

FWIW, I feel the first option would be an hideous hack and the third one would be somehow a violation of the separation of duties between OWLAPIOwl2Obo and OBOFormatWriter.

@hkir-dev If you want to have a look.

FWIW, rough proof-of-concept of a fix using option 2:

--- a/oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIOwl2Obo.java
+++ b/oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIOwl2Obo.java
@@ -1085,7 +1085,7 @@ public class OWLAPIOwl2Obo {
             clause.addValue(propId);
             if (annVal instanceof OWLLiteral) {
                 OWLLiteral owlLiteral = (OWLLiteral) annVal;
-                clause.addValue(owlLiteral.getLiteral());
+                clause.addValue(new PropertyValue(owlLiteral.getLiteral()));
                 OWLDatatype datatype = owlLiteral.getDatatype();
                 IRI dataTypeIri = datatype.getIRI();
                 if (!OWL2Datatype.isBuiltIn(dataTypeIri)) {
@@ -1100,7 +1100,7 @@ public class OWLAPIOwl2Obo {
                     clause.addValue(dataTypeIri.toString());
                 }
             } else if (annVal instanceof IRI) {
-                clause.addValue(getIdentifier((IRI) annVal));
+                clause.addValue(new PropertyValue(getIdentifier((IRI) annVal), true));
             }
             frame.addClause(clause);
         }
diff --git a/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java b/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java
new file mode 100644
index 000000000..131ac36c9
--- /dev/null
+++ b/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java
@@ -0,0 +1,25 @@
+package org.obolibrary.obo2owl;
+
+public class PropertyValue {
+    
+    private String value;
+    private boolean valueIsId;
+    
+    public PropertyValue(String val) {
+        value = val;
+        valueIsId = false;
+    }
+    
+    public PropertyValue(String val, boolean isId) {
+        value = val;
+        valueIsId = isId;
+    }
+    
+    public String toString() {
+        return value;
+    }
+    
+    public boolean isId() {
+        return valueIsId;
+    }
+}
diff --git a/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java b/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
index 47120badb..bfa043d0c 100644
--- a/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
+++ b/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
@@ -25,6 +25,7 @@ import java.util.Set;
 import javax.annotation.Nonnull;
 import javax.annotation.Nullable;
 
+import org.obolibrary.obo2owl.PropertyValue;
 import org.obolibrary.obo2owl.OWLAPIObo2Owl;
 import org.obolibrary.oboformat.model.Clause;
 import org.obolibrary.oboformat.model.Frame;
@@ -558,10 +559,15 @@ public class OBOFormatWriter {
         if (it.hasNext()) {
             // value
             sb.append(' ');
-            String val = it.next().toString(); // TODO replace toString() method
-            sb.append('"');
-            sb.append(escapeOboString(val, EscapeMode.quotes));
-            sb.append('"');
+            PropertyValue val = (PropertyValue) it.next();
+            if ( val.isId() ) {
+                sb.append(escapeOboString(val.toString(), EscapeMode.simple));
+            }
+            else {
+                sb.append('"');
+                sb.append(escapeOboString(val.toString(), EscapeMode.quotes));
+                sb.append('"');
+            }
         }
         while (it.hasNext()) {
             // optional type; there should be only one value left in the iterator

Thanks I'll take a look later today

PR created: #1085

In relation with the OBO Specification of the property_value:

property_value
This tag binds a property to a value in this instance. The value of this tag is a relationship type id, a space, and a value specifier. The value specifier may have one of two forms; in the first form, it is just the id of some other instance, relationship type or term. In the second form, the value is given by a quoted string, a space, and datatype identifier.

IRI values are distinguished from the quoted string values using the heuristic approach applied in the OWLAPIObo2Owl

If Clause has 2 values then it is `relationship type id` + `IRI`. 
If Clause has 3 values then it is `relationship type id` + `value in quoted strings` + `datatype identifier`

fixed in 4 and 5, to be fixed in 6