/fatherhood

Fatherhood is a JSON stream decoding library wrapping megajson's scanner.

Primary LanguageGoMIT LicenseMIT

fatherhood

Build Status

fatherhood is a JSON stream decoding library wrapping the megajson scanner.

It offers a very ugly API in exchange for speed and no code generation.

Performance

Its speed is equivalent to megajson, since it uses the same scanner. All kudos to Ben Johnson, not me.

package ns/op MB/s
fatherhood 52'156'933 37.20
megajson 53'557'744 36.23
encoding/json 98'061'899 19.79

Godocs?

Godoc!

Usage

The general idea of fatherhood goes like this:

  • get a decoder.
  • iterate over the values you need.
  • extract them manually.

Getting a decoder:

dec := fatherhood.NewDecoder(r)

Then you should know what's in the stream you are reading:

err := dec.EachMember(&obj, objVisitor) // decodes objects
err := dec.EachValue(&arr, arrVisitor)  // decodes arrays
err := dec.ReadTypeX(&typeX)            // decodes strings, bool, floats, ints, etc

When you decode an object, you must provide a visitor func that will be invoked for each member of the object:

obj := &objType{} // make sure obj is not a nil pointer!
err := dec.EachMember(&obj, objVisitor)

func objVisitor(dec *Decoder, o interface{}, member string) error {
  obj := o.(*objType)
  switch member {
  case "key1":
    return dec.ReadInt(obj.Key1)
    // and so on
  }
  // You MUST discard members that you choose not to decode.
  return dec.Discard()
}

Similarly, when you decode an array, you must provide a visitor func that will be invoked for each element in the array:

arr := make([]arrType, 0) // make sure arr is not nil!
err := dec.EachValue(&arr, objVisitor)

func arrVisitor(dec *fatherhood.Decoder, a interface{}, t fatherhood.JSONType) error {
  arr := a.(*[]arrType)
  switch t {
  case fatherhood.Object:
    obj := &objType{}
    err := dec.EachMember(obj, decodeNode)
    *arr = append(*arr, obj)
    return err
  }
  // You MUST discard values that you choose not to decode.
  return dec.Discard()
}

All this pointer stuff is tricky and easy to mess up. Make sure you test your things carefully.

Example

Extracted from the benchmark code. Be cautious, messed up code ahead:

type codeResponse struct {
  Tree     *codeNode `json:"tree"`
  Username string    `json:"username"`
}

type codeNode struct {
  Name     string      `json:"name"`
  Kids     []*codeNode `json:"kids"`
  CLWeight float64     `json:"cl_weight"`
  Touches  int         `json:"touches"`
  MinT     int64       `json:"min_t"`
  MaxT     int64       `json:"max_t"`
  MeanT    int64       `json:"mean_t"`
}

func Unmarshal(data []byte, code *codeResponse) error {

  read := bytes.NewReader(data)

  var (
    decodeNodeArr  func(*Decoder, interface{}, JSONType) error
    decodeNode     func(*Decoder, interface{}, string) error
    decodeResponse func(*Decoder, interface{}, string) error
  )

  decodeResponse = func(dec *Decoder, r interface{}, member string) error {
    resp := r.(*codeResponse)
    switch member {
    case "username":
      return dec.ReadString(&resp.Username)
    case "tree":
      resp.Tree = &codeNode{}
      return dec.EachMember(resp.Tree, decodeNode)
    }
    return fmt.Errorf("unsupported member %s", member)
  }

  decodeNode = func(dec *Decoder, n interface{}, member string) error {
    node := n.(*codeNode)
    switch member {
    case "name":
      return dec.ReadString(&node.Name)
    case "cl_weight":
      return dec.ReadFloat64(&node.CLWeight)
    case "touches":
      return dec.ReadInt(&node.Touches)
    case "min_t":
      return dec.ReadInt64(&node.MinT)
    case "max_t":
      return dec.ReadInt64(&node.MaxT)
    case "mean_t":
      return dec.ReadInt64(&node.MeanT)
    case "kids":
      node.Kids = make([]*codeNode, 0)
      return dec.EachValue(&node.Kids, decodeNodeArr)
    }
    // or dec.Discard() to carry on
    return fmt.Errorf("unsupported member %s", member)

  }

  decodeNodeArr = func(dec *Decoder, a interface{}, t JSONType) error {
    arr := a.(*[]*codeNode)
    switch t {
    case Object:
      node := &codeNode{}
      err := dec.EachMember(node, decodeNode)
      *arr = append(*arr, node)
      return err
    }
    return fmt.Errorf("unsupported type %#v", t)
  }

  return NewDecoder(read).EachMember(code, decodeResponse)
}

Details

Performance values taken from running benchmarks on the following revisions:

  • megajson, on master:
benbjohnson/megajson $ git rev-parse HEAD
533c329f8535e121708a0ee08ea53bda5edfbe79
  • fatherhood, on master:
aybabtme/fatherhood $ git rev-parse HEAD
5cfd87c089e3829a28c9cfcd8993370bf787ffa1
  • encoding/json, on release-branch.go1.2:
encoding/json $ hg summary
parent: 18712:0ddbdc3c7ce2 go1.2.1 release
 go1.2.1
branch: release-branch.go1.2

Why use this...

Instead of megajson?

megajson uses code generation to create decoders/encoders for your types, fatherhood doesn't.

Some combinations of types aren't working in megajson but they work with fatherhood. For instance, I wrote fatherhood because megajson didn't decode objects containing []uint64.

Instead of encoding/json?

fatherhood is faster than the standard library decoder.

Why use megajson instead of fatherhood?

megajson offers an encoder, while fatherhood only decodes.

Regarding code generation, megajson gives you drop in codecs. Meanwhile, the fatherhood API is ugly and painful to use.

Why use encoding/json instead of fatherhood?

The standard library offers a much nicer API. You should always prefer encoding/json to fatherhood unless JSON decoding speed becomes a problem.