tarantool/avro-schema

introduce union type determinator

Khatskevich opened this issue · 1 comments

It is not native for a web to support jsons generated by Avro unions:

{int, strint} -- schema
{ "int": 5} -- unflattened
{5} -- what web wants

In flattened data, Avro schema uses extra field which determines the nype of the following object.

The proposal is to introduce custom type determinators. Point out filed which already exist in the parent object which determines the children object type.

  1. As I think, parent object almost always has this field, because application logic needs a way to determine child type too.
  2. This field may be reused by Avro on both: flatten and unflatten, we just need to provide information, how to use this determinator.
    result:
---- Schema
{
type = record,
name = x,
fields =  [
    { name: child_type, type: enum},
    { name: child, :type { type1, type2},
]}
---- Data
{child_type: type1, child: child_data}
----- Data in the current implementation
{child_type: type1, child: {type1: child_data}}

We also need this to be implemented. I'm using OpenAPI to describe HTTP API of some services. In order to dry out duplications in openapi and avro formats, I would like to write schema once and convert it to avro to use it on backend side. But I encountered an issue with Avro union type in tarantool avro-schema module:

Here is the example:

Tarantool Enterprise For Mac 2.2.1-13-g8bfcb4422
type 'help' for interactive help
tarantool> avro = require('avro_schema')
---
...

tarantool> ok, h = avro.create({
         >     name = 'test_schema',
         >     type = 'record',
         >     fields = {
         >         { name = 'name', type = { 'null', 'string' } },
         >     }
         > })
---
...

tarantool> avro.validate(h, {name = 'My Name'})
---
- false
- 'name: Not a union: My Name'
...

Which makes me hard to validate the same object with both openapi and avro schemas.

Here is how avro_validator python library handles it:

import json
from avro_validator.schema import Schema

schema = json.dumps({
    'name': 'test schema',
    'type': 'record',
    'doc': 'schema for testing avro_validator',
    'fields': [
        {
            'name': 'name',
            'type': ['null', 'string']
        }
    ]
})

Schema(schema).parse().validate({'name': 'My Name'}) # True