MaterializeInc/datagen

Bug: Avro Schema not processed when Record type is used with in fields

YajneshPadiyar opened this issue · 1 comments

Describe the bug

When trying to generate data using avro schmea that has record type in the fields. Data is generated as string and the record type is ignored.

To Reproduce

  1. Consider below schema. saved at tests/schema.avsc

{
"type": "record",
"name": "products",
"namespace": "exp.products.v1",
"fields": [
{ "name": "id", "type": "string" },
{ "name": "productId", "type": ["null", "string"] },
{ "name": "title", "type": "string" },
{ "name": "price", "type": "int" },
{ "name": "isLimited", "type": "boolean" },
{ "name": "sizes", "type": ["null", "string"], "default": null },
{ "name": "ownerIds", "type": { "type": "array", "items": "string" } },
{
"name": "address",
"type": {
"type": "record",
"name": "Address",
"fields": [
{
"name": "ID",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "STREET",
"type": [
"null",
"string"
],
"default": null
}
]}}
]
}

  1. Run the command datagen -s tests/schema.avsc -n 1 -dr -f avro
  2. The result is as below

{
"id": "'N7"pALPHJ",
"productId": "fi*=2Zp?B*",
"title": "!yqQ>6{3+I",
"price": 87678,
"isLimited": false,
"sizes": "-653[vwS6+",
"ownerIds": [
"%75#fB+%m|",
"N)fj1rBGOp",
23720,
"8>vTA@#)S=",
"H&QmrQ]TdV",
10082,
"YaqG3o"`lx",
"aB/vXGNfcr",
57678,
"s$zC0Rze%X"
],
"address": "8TzdK'H<ak"
}

Expected behavior

The data should be generated as below

{
"id": ")VA;tD#=%",
"productId": "zP=8.!:zHh",
"title": ";qlT+^i{?h",
"price": 92089,
"isLimited": false,
"sizes": ";kKW.\[!,h",
"ownerIds": [
92325,
77425,
10334,
22855,
62197,
62238,
6471,
"<=z&|<zEL2",
"FN%T:':(g
",
"b|7?WQVzTF"
],
"address": {
"ID": "G{EZ7q>UAZ",
"STREET": ".qwz7vv=Uk"
}
}

Screenshots

Screenshot 2023-04-20 at 6 07 44 pm

Additional context

I have identified the problem to be in the convertAvroSchemaToJson function in parseAvroSchema.ts file. Here the if condition is looking for if ((column.type === 'record')) { but this will always be false as the column.type is a object.

This Bug is fixed.