Tags with underscore are not parsed correctly
jm-g opened this issue · 2 comments
According to the Org user guide, tags are defined as follows:
Tags are normal words containing letters, numbers, ‘_’, and ‘@’.
But the parser seems to handle the _
as a format annotation.
(transform (parse "* Headline :tag_a:\n"))
;; => {:headlines
[{:headline
{:level 1,
:title
[[:text-normal "Headline :tag"]
[:text-sub [:text-subsup-word "a"]]
[:text-normal ":"]],
:planning [],
:tags []}}]}
In my opinion, the correct behavior would be
(transform (parse "* Headline :tag_a:\n"))
;; => {:headlines
[{:headline
{:level 1,
:title [[:text-normal "Headline"]],
:planning [],
:tags ["tag_a"]}}]}
This is with org-parser 0.1.27 with Clojure on the JVM.
Thanks for the report.
I just tried this:
org-parser.core=> (read-str "* foo :_:")
{:headlines [{:headline {:level 1, :title [[:text-normal "foo"]], :planning [], :tags ["_"]}}]}
But if "_" is followed by a letter, it doesn't work. Don't yet understand why...
https://github.com/200ok-ch/org-parser/blob/master/src/org_parser/transform.cljc#L66
Oh, I think I got it. extract-tags
function does not receive the raw string but the parsed headline text. And the "_" causes the headline text to be parsed to text followed by text-subsup-word (subscript text).
I don't have time currently to work on this. Do you want to give it a try to fix it?
The reason why we didn't parse the tags directly and instead leave it to transform is documented here:
https://github.com/200ok-ch/org-parser/blob/master/resources/org.ebnf#L37