spotify/magnolify

Magnolify decodes Optional toString value of optional byte array field instead of field itself

mklevin opened this issue · 2 comments

Given the following type definition:

@BigQueryType.toTable
case class TestType(
   id: String,
   bytes: Optional[Array[Byte]]

private val bqType = TableRowType[TestType]

Running the following pipeline test:

"test" should "work" in {
    val row = new TableRow().set("id", "test-id").set("bytes", Some(Array(1.toByte, 9.toByte)))
    JobTest[TestJob.type]
        .args("--test-table=test:table.def")
        .input(BigQueryIO(Table.Spec("test:table.def")), Seq(row))
        .run
}

Returns the following stack-trace (trimmed for relevance):

Caused by: java.lang.IllegalArgumentException: com.google.common.io.BaseEncoding$DecodingException: Unrecognized character: {
	at com.google.common.io.BaseEncoding.decode(BaseEncoding.java:219)
	at magnolify.bigquery.TableRowField$.$anonfun$trfByteArray$1(TableRowType.scala:172)
	at magnolify.bigquery.TableRowField$$anon$4.from(TableRowType.scala:160)
	at magnolify.bigquery.TableRowField$$anon$5.from(TableRowType.scala:187)
	at magnolify.bigquery.TableRowField$$anon$5.from(TableRowType.scala:183)
	at magnolify.bigquery.TableRowField.fromAny(TableRowType.scala:71)
	at magnolify.bigquery.TableRowField.fromAny$(TableRowType.scala:71)
	at magnolify.bigquery.TableRowField$$anon$5.fromAny(TableRowType.scala:183)
	at magnolify.bigquery.TableRowField$$anon$2.$anonfun$from$1(TableRowType.scala:110)

The full String that trfByteArray is attempting to decode is {empty=false, defined=true}, rather than the byte array itself. This also happens if None is passed instead of an array, but does not happen if the field is not populated at all.

I can't repro with the following further minimized blob. So it's possibly something in the JobTest code.

import com.spotify.scio.bigquery.{BigQueryType, TableRow}

import magnolify.bigquery._
object Test {
  def main(args: Array[String]): Unit = {
    val bqt = TableRowType[TestType]

    val r1 = TestType("hello", Some("world".getBytes))
    val tr1: TableRow = TestType.toTableRow(r1)
    println(tr1)
    val r1a = bqt(tr1)
    println((r1a, r1a.bytes.map(new String(_))))

    val r2 = TestType("hello", None)
    val tr2: TableRow = TestType.toTableRow(r2)
    println(tr2)
    val r2a = bqt(tr2)
    println((r2a, r2a.bytes.map(new String(_))))
  }

  @BigQueryType.toTable
  case class TestType(id: String, bytes: Option[Array[Byte]])
}
GenericData{classInfo=[f], {id=hello, bytes=d29ybGQ=}}
(TestType(hello,Some([B@7d82bbc7)),Some(world))
GenericData{classInfo=[f], {id=hello}}
(TestType(hello,None),None)

Turns out this has nothing to do with either magnolify or scio.

TableRow here is a pure Java type using Jackson for ser/de, so the bytes field needs to be plain byte array with Some().
new TableRow().set("id", "test-id").set("bytes", Some(Array(1.toByte, 9.toByte)))