SynBioDex/pySBOL3

Detect / prevent namespaces from ending with slash

Closed this issue · 11 comments

The definition of namespaces in the specification implies that they should not end with a '/', but we currently do not detect or prevent that.

I would recommend applying this both in object construction and in Document.set_namespace. If a terminal '/' is found, remove and warn.

Related to #328

Can you quote the spec, including page number in the 3.0.1 spec, where it implies that namespaces should not end with a /?

Have you seen any issues/bugs with a namespace that ends with a slash? That would help with unit tests if you have.

In the spec, Section 7.2, line 14 (page 41) gives the recommended structure for compliant URIs. It doesn't explicitly prohibit putting a '/' at the end of a URI, but it's clearly the intention given the typical conventions of URI construction. A URI can of course, have all sorts of crazy structure, so it's not prohibited, but I'd like to at least warn people when they are taking aim at their toes.

I also suppose that rather than pre-emptively stopping people, it would be fine to let it be a validation rule too.

I've just finished working through a batch of bugs of this form in SynBioDex/SBOL-utilities#49

Sorry to be dense. I still don't see what the problem is. Does pySBOL3 mishandle namespaces that end with a slash? If so, we should definitely fix that. pySBOL3 should always be using posixpath.join for assembling namespaces with other strings, as when an identity is constructed.

I see that validation rule sbol3-10102 (page 66) states:

A TopLevel URL MUST use the following pattern: [namespace]/[local]/[displayId], where namespace and displayId are required fragments, and the local fragment is an optional relative path.
Reference: Section5.1 on page 12

If this rule is taken literally, I can see your point that if the namespace ends with a / then the identity would have a double-slash. On the other hand, if we follow the spirit of this rule I feel like a namespace ending in a / is ok. This is a topic we should open up to more people before implementing a validation rule for namespaces ending in a slash.

I definitely want to pragmatically avoid the question as much as possible via path-style constructions.

It is also the case, that, to the best of my understanding, posix path construction explicitly declares multiple slashes as equivalent single slashes, while for URIs that is not the case, so we are operating in a framework that is less permissive than with paths. For paths Desktop//tmp = Desktop/tmp = Desktop/tmp/, but for URIs, these are not equal.

I think namespaces should end in a delimiter ('/', '#', or ':'). In SBOL2, all namespaces do end in a delimiter from this set. I think it is important to end with a delimiter, since there is a choice of three.

@cjmyers I think you might be talking about the serialization, rather than the data model, since SBOL2 does not have a namespace property.

I do not believe that the SBOL3 specification, as written, currently allows for anything except for '/' as a delimiter separating the namespace from the rest of a URL. For URIs other than URLs, a delimiter does not apply at all, of course.

Yes, I understand this is for the namespace property. I'm not sure why we would disallow # and : as delimiters for namespaces, since they are commonly used.

Can you please point me to an example? I haven't previously seen one as an actual identity URI, as opposed to in a serialization prefixing.

In compliant URIs in SBOL2, this was not allowed. So, there are likely not many examples in the wild, except maybe SBOL1 non-compliant URIs. The question is whether or not we want to allow these delimiters in SBOL3 or not.

They would still be allowed in an SBOL3 URI, just not as the delimiter that separates a namespace from the rest of the URI.

Closing this as resolved by using URI equality rather than string equality.