linkedin/goavro

Error encoding map[string]float64 while schema is a map

rweics opened this issue · 3 comments

I attempted to encode a field with map[string]float64 type while schema for that field is:
{"type": "map", "values": "double"}
And I got the following error:

encode record (RECORD): cannot encode record (RECORD): cannot encode union (union): datum ought match schema: expected: null, map; received: map[string]float64

It seems that goavro only supports map[string]interface{}, but that's not always convenient. On the contrary, if map value is float64 we know that all values fit schema thus there is no need to check individual map values which might be slightly more efficient.

You are completely correct. It's not terribly convenient, and this library should do better, although I'm not entirely certain how it should be modified to make it a better experience.

A few options that float to mind (pun intended...):

  1. Although slow, consider performing some reflection to ascertain the type of the map value, and use that during the type assertions of the encoder.

  2. Maybe if goavro were a binary that exported Go source code to be compiled into a program, it could emit the proper code that performed type assertion on the proper value type of the map.

Ideally I think users generally would want something close to the built-in JSON package
https://golang.org/pkg/encoding/json/

Honestly the current interface isn't very for people to work with. I'd suggest that you and your team can consider supporting the following use cases:

type UUID *string

type NestedStruct struct {
    F    []string                   // schema is array of string
}

type ExampleStruct struct {
    A    int                        // schema is either "int" or "long"
    B    *string                    // schema is "string"
    C    UUID                       // schema is "string"
    D    float32                    // schema is either "float" or "double"
    E    map[string]*NestedStruct   // schema is a map of record
}

var schema string // let's assume a valid schema in JSON format is loaded

encoder, err := goavro.NewEncoder(schema) // assume no error

dataIn := &ExampleStruct{} // assume the struct isn't empty
binary := &bytes.Buffer{}
// Encoder implements Encode(data interface{}, buffer io.Writer) error,
// where data can be JSON string, map[string]interface{} or arbitrary structs
// that json package supports
err = encoder.Encode(dataIn, binary) 

decoder, err := goavro.NewDecoder(schema)

container = &ExampleStruct{}
// Decoder implements Decode(binary io.Reader, container interface{}),
// where container can be either map[string]interface{} or arbitrary struct
err = decoder.Decode(bytes.NewReader(binary.Bytes()), container)

anotherContainer := make(map[string]interface{})
err = decoder.Decode(bytes.NewReader(binary.Bytes()), anotherContainer)

Right now goavro is very strict on data types and users have to traverse object tree in parallel with schema themselves before sending *goavro.Record to Encoder, especially when dealing with nested records or nested arrays, etc. Although not convenient, I would say it's still the best open source Go Avro library I could find so far. I hope eventually it can match with the json package though, which is very flexible and lenient.

@rweics,

The newer version of this library properly handles this case. Please give it a try: https://github.com/karrick/goavro

func ExampleMap() { codec, err := goavro.NewCodec({
"name": "r1",
"type": "record",
"fields": [{
"name": "f1",
"type": {"type":"map","values":"double"}
}]
}`)
if err != nil {
log.Fatal(err)
}

buf, err := codec.TextualFromNative(nil, map[string]interface{}{
	"f1": map[string]float64{
		"k1": 3.5,
	},
})
if err != nil {
	log.Fatal(err)
}
fmt.Println(string(buf))
// Output: {"f1":{"k1":3.5}}

}
`