clj-commons/clj-yaml

Keywords mode broken for complex keys that happen to contain `/`

bshepherdson opened this issue · 6 comments

This logic https://github.com/clj-commons/clj-yaml/blob/master/src/clojure/clj_yaml/core.clj#L144 uses (or (keyword k) k) to try to parse map keys when the keywords setting is true.

Unfortunately clojure.core/keyword is a bit too forgiving, and parses some things that it probably shouldn't. My map has JSON strings as keys, and some of the parts of those keys have / in them, which results in this:

(def k "[\"ref\",\"type/BigInteger\"]")
(keyword k)              ; => :["ref","type/BigInteger"]
(namespace (keyword k))  ; => "[\"ref\",\"type"
(name (keyword k))       ; => "BigInteger\"]"

Since there are unmatched quotes and so on in there, the resulting EDN is mangled - doesn't (read-string (pr-str x)) properly anymore.

(yaml.core/parse-string
  "column_settings: {'[\"ref\",\"type/BigInteger\"]': {column_title: Invoices}}"
  :keywords true)
; => {:column_settings #:["ref","type{:BigInteger"] {:column_title "Invoices"}}}

which has unbalanced {}s because the second { is quoted but there are three real }}} at the end.

A stricter condition for what makes a valid keyword is needed. I've worked around this with the regex #"^[0-9a-zA-Z_/\.\-]+$" for my own purposes; that may not be quite general enough.

lread commented

Interesting! Thanks for reporting @bshepherdson!

lread commented

Research

I had a look around at what other libraries do here.

  • Clojure's data.json is a bit smarter by leaving things up to the caller. It lets you provide a :key-fn - you decide exactly how you want to convert.
  • Cheshire has both options, effectively a boolean like us and the ability to specify a fn like data.json. It looks like it just uses Clojure's keyword like we do when not using a fn.

Here are some initial ideas:

Option 1: Do nothing.

Let the people suffer! 😈

Option 2: Silently do not convert a YAML key to a Clojure keyword when the resulting keyword would be unreadable.

We do this for some cases now, anything that returns nil for keyword is left as is.
Ex.

user=> (keyword 45)
nil

Option 3: Throw when the conversion from YAML key to Clojure keyword results in an unreadable keyword.

This is a variant of Option 2.
Would this be considered a breaking change?
Is it ever desirable behaviour?

Option 4: Introduce a :keywords-fn option ala data.json and Cheshire libs.

This would allow the caller to do whatever conversion they like.

Musings

I don't like option 1. Far too cruel.

I don't know that option 3 would be desirable. Maybe it would be good to know that what you asked for could not happen, then you'd switch to Option 4?

Option 4 seems like a clear winner to me, it gives the user full control to do whatever they like.

We could also maybe do some work around option 2 - or alternatively even deprecate the :keywords option in favour of a new :keywords-fn option.

Thoughts, alternatives, ideas anyone?

I'm away today but will take a look tomorrow

I prefer option 4. Roundtripping JSON <-> EDN with keywords is a known problem when keys contain spaces, etc. I would not put any efforts in trying to "do the right thing" but leave this right thing to the user. I prefer the name :key-fn (subjective, I know, but it's the same as clojure.data.json).

lread commented

Thanks @borkdude, sounds good to me.

I'll follow up with a PR that also updates docs to promote usage of :key-fn over existing :keywords.

Note clj-yaml docs state:

By default, keys are converted to clojure keywords. To prevent this, add :keywords false parameters to the parse-string function

I feel this is a bit of an unfortunate inherited design choice. I would have preferred that this behaviour be off by default.

lread commented

I like :key-fn but I will drift from clojure.data.json in function args.
I'll have it accept a map for future extensibility.
Initially, the only recognized key will be :key.