/cheshire

Clojure JSON and JSON SMILE (binary json format) encoding/decoding

Primary LanguageClojureMIT LicenseMIT

Cheshire

'Cheshire Puss,' she began, rather timidly, as she did not at all know whether it would like the name: however, it only grinned a little wider. 'Come, it's pleased so far,' thought Alice, and she went on. 'Would you tell me, please, which way I ought to go from here?'

'That depends a good deal on where you want to get to,' said the Cat.

'I don't much care where--' said Alice.

'Then it doesn't matter which way you go,' said the Cat.

'--so long as I get SOMEWHERE,' Alice added as an explanation.

'Oh, you're sure to do that,' said the Cat, 'if you only walk long enough.'

Cheshire is fast JSON encoding, based off of clj-json and clojure-json, with additional features like Date/UUID/Set/Symbol encoding and SMILE support.

Clojure code with docs

Continuous Integration status

Why?

clojure-json had really nice features (custom encoders), but was slow; clj-json had no features, but was fast. Cheshire encodes JSON fast, with added support for more types and the ability to use custom encoders.

Usage

[cheshire "5.5.0"]

;; Cheshire v5.5.0 uses Jackson 2.5.3

;; In your ns statement:
(ns my.ns
  (:require [cheshire.core :refer :all]))

Encoding

;; generate some json
(generate-string {:foo "bar" :baz 5})

;; write some json to a stream
(generate-stream {:foo "bar" :baz 5} (clojure.java.io/writer "/tmp/foo"))

;; generate some SMILE
(generate-smile {:foo "bar" :baz 5})

;; generate some JSON with Dates
;; the Date will be encoded as a string using
;; the default date format: yyyy-MM-dd'T'HH:mm:ss'Z'
(generate-string {:foo "bar" :baz (java.util.Date. 0)})

;; generate some JSON with Dates with custom Date encoding
(generate-string {:baz (java.util.Date. 0)} {:date-format "yyyy-MM-dd"})

;; generate some JSON with pretty formatting
(generate-string {:foo "bar" :baz {:eggplant [1 2 3]}} {:pretty true})
;; {
;;   "foo" : "bar",
;;   "baz" : {
;;     "eggplant" : [ 1, 2, 3 ]
;;   }
;; }

;; generate JSON escaping UTF-8
(generate-string {:foo "It costs £100"} {:escape-non-ascii true})
;; => "{\"foo\":\"It costs \\u00A3100\"}"

;; generate JSON and munge keys with a custom function
(generate-string {:foo "bar"} {:key-fn (fn [k] (.toUpperCase (name k)))})
;; => "{\"FOO\":\"bar\"}"

In the event encoding fails, Cheshire will throw a JsonGenerationException.

Decoding

;; parse some json
(parse-string "{\"foo\":\"bar\"}")
;; => {"foo" "bar"}

;; parse some json and get keywords back
(parse-string "{\"foo\":\"bar\"}" true)
;; => {:foo "bar"}

;; parse some json and munge keywords with a custom function
(parse-string "{\"foo\":\"bar\"}" (fn [k] (keyword (.toUpperCase k))))
;; => {:FOO "bar"}


;; parse some SMILE (keywords option also supported)
(parse-smile <your-byte-array>)

;; parse a stream (keywords option also supported)
(parse-stream (clojure.java.io/reader "/tmp/foo"))

;; parse a stream lazily (keywords option also supported)
(parsed-seq (clojure.java.io/reader "/tmp/foo"))

;; parse a SMILE stream lazily (keywords option also supported)
(parsed-smile-seq (clojure.java.io/reader "/tmp/foo"))

In 2.0.4 and up, Cheshire allows passing in a function to specify what kind of types to return, like so:

;; In this example a function that checks for a certain key
(decode "{\"myarray\":[2,3,3,2],\"myset\":[1,2,2,1]}" true
        (fn [field-name]
          (if (= field-name "myset")
            #{}
            [])))
;; => {:myarray [2 3 3 2], :myset #{1 2}}

The type must be "transient-able", so use either #{} or []

Custom Encoders

Custom encoding is supported from 2.0.0 and up, if you encounter a bug, please open a github issue. From 5.0.0 onwards, custom encoding has been moved to be part of the core namespace (not requiring a namespace change)

;; Custom encoders allow you to swap out the api for the fast
;; encoder with one that is slightly slower, but allows custom
;; things to be encoded:
(ns myns
  (:require [cheshire.core :refer :all]
            [cheshire.generate :refer [add-encoder encode-str remove-encoder]]))

;; First, add a custom encoder for a class:
(add-encoder java.awt.Color
             (fn [c jsonGenerator]
               (.writeString jsonGenerator (str c))))

;; There are also helpers for common encoding actions:
(add-encoder java.net.URL encode-str)

;; List of common encoders that can be used: (see generate.clj)
;; encode-nil
;; encode-number
;; encode-seq
;; encode-date
;; encode-bool
;; encode-named
;; encode-map
;; encode-symbol
;; encode-ratio

;; Then you can use encode from the custom namespace as normal
(encode (java.awt.Color. 1 2 3))
;; => "java.awt.Color[r=1,g=2,b=3]"

;; Custom encoders can also be removed:
(remove-encoder java.awt.Color)

;; Decoding remains the same, you are responsible for doing custom decoding.

NOTE: `cheshire.custom` has been deprecated in version 5.0.0

Custom and Core encoding have been combined in Cheshire 5.0.0, so there is no longer any need to require a different namespace depending on what you would like to use.

Aliases

There are also a few aliases for commonly used functions:

encode -> generate-string
encode-stream -> generate-stream
encode-smile -> generate-smile
decode -> parse-string
decode-stream -> parse-stream
decode-smile -> parse-smile

Features

Cheshire supports encoding standard clojure datastructures, with a few additions.

Cheshire encoding supports:

Clojure data structures

  • strings
  • lists
  • vectors
  • sets
  • maps
  • symbols
  • booleans
  • keywords (qualified and unqualified)
  • numbers (Integer, Long, BigInteger, BigInt, Double, Float, Ratio, Short, Byte, primatives)
  • clojure.lang.PersistentQueue

Java classes

  • Date
  • UUID
  • java.sql.Timestamp
  • any java.util.Set
  • any java.util.Map
  • any java.util.List

Custom class encoding while still being fast

Also supports

  • Stream encoding/decoding
  • Lazy decoding
  • Pretty-printing JSON generation
  • Unicode escaping
  • Custom keyword coercion
  • Pretty-printing JSON generation
  • Unicode escaping
  • Arbitrary precision for decoded values:

Cheshire will automatically use a BigInteger if needed for non-floating-point numbers, however, for floating-point numbers, Doubles will be used unless the *use-bigdecimals?* symbol is bound to true:

(ns foo.bar
  (require [cheshire.core :as json]
           [cheshire.parse :as parse]))

(json/decode "111111111111111111111111111111111.111111111111111111111111111111111111")
;; => 1.1111111111111112E32 (a Double)

(binding [parse/*use-bigdecimals?* true]
  (json/decode "111111111111111111111111111111111.111111111111111111111111111111111111"))
;; => 111111111111111111111111111111111.111111111111111111111111111111111111M (a BigDecimal)

Change Log

Change log is available on GitHub.

Speed

Cheshire is about twice as fast as data.json.

Check out the benchmarks in cheshire.test.benchmark; or run lein benchmark. If you have scenarios where Cheshire is not performing as well as expected (compared to a different library), please let me know.

Experimental things

In the cheshire.experimental namespace:

$ echo "Hi. \"THIS\" is a string.\\yep." > /tmp/foo

$ lein repl
user> (use 'cheshire.experimental)
nil
user> (use 'clojure.java.io)
nil
user> (println (slurp (encode-large-field-in-map {:id "10"
                                                  :things [1 2 3]
                                                  :body "I'll be removed"}
                                                 :body
                                                 (input-stream (file "/tmp/foo")))))
{"things":[1,2,3],"id":"10","body":"Hi. \"THIS\" is a string.\\yep.\n"}
nil

encode-large-field-in-map is used for streamy JSON encoding where you want to JSON encode a map, but don't want the map in memory all at once (it returns a stream). Check out the docstring for full usage.

It's experimental, like the name says. Based on Tigris.

Advanced customization for factories

See this and this for a list of features that can be customized if desired. A custom factory can be used like so:

(ns myns
  (:require [cheshire.core :as core]
            [cheshire.factory :as factory]))

(binding [factory/*json-factory* (factory/make-json-factory
                                  {:allow-non-numeric-numbers true})]
  (json/decode "{\"foo\":NaN}" true))))))

See the default-factory-options map in factory.clj for a full list of configurable options. Smile factories can also be created, and factories work exactly the same with custom encoding.

Future Ideas/TODOs

  • move away from using Java entirely, use Protocols for the custom encoder (see custom.clj)
  • allow custom encoders (see custom.clj)
  • figure out a way to encode namespace-qualified keywords
  • look into overriding the default encoding handlers with custom handlers
  • better handling when java numbers overflow ECMAScript's numbers (-2^31 to (2^31 - 1))
  • handle encoding java.sql.Timestamp the same as java.util.Date
  • add benchmarking
  • get criterium benchmarking ignored for 1.2.1 profile
  • look into faster exception handling by pre-allocating an exception object instead of creating one on-the-fly (maybe ask Steve?)
  • make it as fast as possible (ongoing)

License

Release under the MIT license. See LICENSE for the full license.

Thanks

Thanks go to Mark McGranaghan for clj-json and Jim Duey for the name suggestion. :)