hangxie/parquet-tools

INTERVAL type import or cat problem

Closed this issue · 1 comments

Imported from a jsonl to parquet with INTERVAL date type, then cat got panic:

$ parquet-tools cat -f jsonl cmd/testdata/all-types.parquet > /tmp/data.jsonl
$ parquet-tools schema -f json cmd/testdata/all-types.parquet > /tmp/schema.json
$ parquet-tools import -m /tmp/schema.json -f jsonl -s /tmp/data.jsonl /tmp/imported.parquet
$ parquet-tools cat /tmp/imported.parquet
[panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/alecthomas/kong.catch(0x140002ffe60)
	github.com/alecthomas/kong@v0.2.16/kong.go:383 +0xb8
panic({0x100d1f3a0, 0x1400090c990})
	runtime/panic.go:838 +0x204
github.com/xitongsys/parquet-go/types.DECIMAL_BYTE_ARRAY_ToString({0x1013b0020?, 0x0?, 0x0?}, 0x100a1236d?, 0x140002fee28?)
	github.com/xitongsys/parquet-go@v1.6.3-0.20220514031026-134bd047b233/types/converter.go:127 +0x1d0
github.com/hangxie/parquet-tools/cmd.reinterpretNestedFields(0x140002ff078, {0x1400008a6d0, 0x0, 0x0}, {0x10?, 0x1?, 0x140002fefc8?, 0x100294334?})
	github.com/hangxie/parquet-tools/cmd/cat.go:229 +0x604
github.com/hangxie/parquet-tools/cmd.reinterpretNestedFields(0x140006781d0, {0x1400008a6d0, 0x1, 0x1}, {0x0?, 0x4?, 0x1400035f100?, 0x14000479110?})
	github.com/hangxie/parquet-tools/cmd/cat.go:216 +0x248
github.com/hangxie/parquet-tools/cmd.(*CatCmd).Run(0x10137ef80, 0x1?)
	github.com/hangxie/parquet-tools/cmd/cat.go:112 +0x978
reflect.Value.call({0x100c3f940?, 0x10137ef80?, 0x1400035fad8?}, {0x1009f271b, 0x4}, {0x140004746f0, 0x1, 0x1002aaeec?})
	reflect/value.go:556 +0x5e4
reflect.Value.Call({0x100c3f940?, 0x10137ef80?, 0x9?}, {0x140004746f0, 0x1, 0x1})
	reflect/value.go:339 +0x98
github.com/alecthomas/kong.callMethod({0x1009f224f, 0x3}, {0x100d247a0?, 0x10137ef80?, 0x3?}, {0x100c3f940?, 0x10137ef80?, 0x0?}, 0x0?)
	github.com/alecthomas/kong@v0.2.16/callbacks.go:71 +0x3a4
github.com/alecthomas/kong.(*Context).RunNode(0x140001ff200, 0x14000412380, {0x1400035ff00, 0x1, 0x1})
	github.com/alecthomas/kong@v0.2.16/context.go:706 +0x468
github.com/alecthomas/kong.(*Context).Run(0x140004121c0?, {0x1400035ff00?, 0x0?, 0x0?})
	github.com/alecthomas/kong@v0.2.16/context.go:723 +0xc0
main.main()
	github.com/hangxie/parquet-tools/main.go:40 +0x2bc

It works fine if this field is removed:

    {
      "Tag": "name=Interval, type=FIXED_LEN_BYTE_ARRAY, convertedtype=INTERVAL, repetitiontype=REQUIRED"
    },

Turned out schema command does not output length of interval field, which should 12,