sanpii/todo-txt

"url:" tags parsing is surprising

Closed this issue · 6 comments

Tags whose value contain a colon confuse the parser. Granted, it's ruled out by the todo.txt spec but the url: tag it quite useful to me.
Would it be possible to not follow the standard for this use-case ?

If the task is:

2018-03-26 test url:http://example.org

then parsing and printing the task out displays:

2018-03-26 test//example.org url:http:                                                                                                                                                                          

It’s a difficult bug to balance: if the parser accept the / character in tag values, the URL in subject will be parsed as tag. Maybe we can exclude value beginning with a / character. It’s probably not possible to do that with a regex.

The spec states “Both key and value must consist of non-whitespace characters, which are not colons.”. So I guess the behavior in case of a tag that would contain two colons in the word is implementation-defined.

As such, I think we should think about what we want to do here. In my mind, a tag must always be enclosed by spaces: there is no reason a:b:c should be parsed as a:b being a tag and :c in the subject line.

Which means a:b:c should (at least in my mind) either be parsed as not being a tag at all (thus being all in the subject line), or as being a tag a:b with value c, or as a tag a with value b:c.

The tag a:b with value c does not make sense imo. Which leaves the other two options.

The first one is likely the closest to the todo.txt specification, but I'd argue use cases like putting URLs in tags trump strict specification compliance, and thus : should be allowed in tag values.

Adding a special-case for values beginning with a / character seems pretty fragile to me, as it'll exclude eg. mailto: URLs, etc. Also, it reduces overall consistency -- again, in my opinion only. :)

Nah, the problem is that there is already a workaround to prevent plain urls in the subject to be parsed as tags. This workaround simply disallows the / char in tags altogether.
From what I understand, in order to allow for url: tags, #6 changes the workaround to instead detect subjects that start with /. As such, the new workaround is closer to the spec than the previous one.
With the new workaround, a:b:c is parsed as a tag with key a and value b:c. The only tricky case was related to / chars.

Oh, I didn't even think there may already be a workaround for not detecting URLs as tags. Nevermind, then!

Adding a special-case for values beginning with a / character seems pretty fragile to me, as it'll exclude eg. mailto: URLs, etc.

It’s impossible to say if mailto: is an URL or a tag. Maybe by using a blacklist for tag key, but it’s another issue (feel free to open another bug if you are interested by this problem).

Here I would like to find a good workaround to let URL (mostly begining with https://) in the subject and parse url:https://example.com as we can expect navely.

I think we can to not follow standard for this case, and propose our reflexion for the v2.

Good point indeed! I guess both options are good enough until the spec can catch up 😃