AbsaOSS/spark-hats

Add Unstruct functionality to flatten a nested struct

AdrianOlosutean opened this issue · 1 comments

Background

Currently, there is no way to flatten a struct field in a certain level of nesting.

Feature

When doing f.nestedMapColumn(), the unstruct function should project the fields of a nested struct on the same level as the parent

Example

For a dataset of the following shape:

root
|-- id: long (nullable = true)
|-- my_array: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- a: long (nullable = true)
|    |    |-- b: string (nullable = true)
|    |    |-- c: struct (containsNull = true)
|    |    |    |--nestedField1: string (nullable = true)
|    |    |    |--nestedField2: long (nullable = true)

Applying df.nestedMapColumn("my_array.c", "my_array", c => unstruct(c)) should result in

root
|-- id: long (nullable = true)
|-- my_array: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- a: long (nullable = true)
|    |    |-- b: string (nullable = true)
|    |    |-- nestedField1: string (nullable = true)
|    |    |-- nestedField2: long (nullable = true)

Thanks for so good issue description. This makes perfect sense.

The syntax will be slightly different though:

df.nestedUnstruct("my_array.c")