Keywords mode broken for complex keys that happen to contain `/`
bshepherdson opened this issue · 6 comments
This logic https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L144 uses (or (keyword k) k)
to try to parse map keys when the keywords
setting is true.
Unfortunately clojure.core/keyword
is a bit too forgiving, and parses some things that it probably shouldn't. My map has JSON strings as keys, and some of the parts of those keys have /
in them, which results in this:
(def k "[\"ref\",\"type/BigInteger\"]")
(keyword k) ; => :["ref","type/BigInteger"]
(namespace (keyword k)) ; => "[\"ref\",\"type"
(name (keyword k)) ; => "BigInteger\"]"
Since there are unmatched quotes and so on in there, the resulting EDN is mangled - doesn't (read-string (pr-str x))
properly anymore.
(yaml.core/parse-string
"column_settings: {'[\"ref\",\"type/BigInteger\"]': {column_title: Invoices}}"
:keywords true)
; => {:column_settings #:["ref","type{:BigInteger"] {:column_title "Invoices"}}}
which has unbalanced {}
s because the second {
is quoted but there are three real }}}
at the end.
A stricter condition for what makes a valid keyword is needed. I've worked around this with the regex #"^[0-9a-zA-Z_/\.\-]+$"
for my own purposes; that may not be quite general enough.
Interesting! Thanks for reporting @bshepherdson!
Research
I had a look around at what other libraries do here.
- Clojure's data.json is a bit smarter by leaving things up to the caller. It lets you provide a
:key-fn
- you decide exactly how you want to convert. - Cheshire has both options, effectively a boolean like us and the ability to specify a fn like data.json. It looks like it just uses Clojure's
keyword
like we do when not using a fn.
Here are some initial ideas:
Option 1: Do nothing.
Let the people suffer! 😈
Option 2: Silently do not convert a YAML key to a Clojure keyword when the resulting keyword would be unreadable.
We do this for some cases now, anything that returns nil for keyword
is left as is.
Ex.
user=> (keyword 45)
nil
Option 3: Throw when the conversion from YAML key to Clojure keyword results in an unreadable keyword.
This is a variant of Option 2.
Would this be considered a breaking change?
Is it ever desirable behaviour?
Option 4: Introduce a :keywords-fn
option ala data.json and Cheshire libs.
This would allow the caller to do whatever conversion they like.
Musings
I don't like option 1. Far too cruel.
I don't know that option 3 would be desirable. Maybe it would be good to know that what you asked for could not happen, then you'd switch to Option 4?
Option 4 seems like a clear winner to me, it gives the user full control to do whatever they like.
We could also maybe do some work around option 2 - or alternatively even deprecate the :keywords
option in favour of a new :keywords-fn
option.
Thoughts, alternatives, ideas anyone?
I'm away today but will take a look tomorrow
I prefer option 4. Roundtripping JSON <-> EDN with keywords is a known problem when keys contain spaces, etc. I would not put any efforts in trying to "do the right thing" but leave this right thing to the user. I prefer the name :key-fn
(subjective, I know, but it's the same as clojure.data.json).
Thanks @borkdude, sounds good to me.
I'll follow up with a PR that also updates docs to promote usage of :key-fn
over existing :keywords
.
Note clj-yaml docs state:
By default, keys are converted to clojure keywords. To prevent this, add :keywords false parameters to the parse-string function
I feel this is a bit of an unfortunate inherited design choice. I would have preferred that this behaviour be off by default.
I like :key-fn
but I will drift from clojure.data.json
in function args.
I'll have it accept a map for future extensibility.
Initially, the only recognized key will be :key
.