Magnolify decodes Optional toString value of optional byte array field instead of field itself
mklevin opened this issue · 2 comments
Given the following type definition:
@BigQueryType.toTable
case class TestType(
id: String,
bytes: Optional[Array[Byte]]
private val bqType = TableRowType[TestType]
Running the following pipeline test:
"test" should "work" in {
val row = new TableRow().set("id", "test-id").set("bytes", Some(Array(1.toByte, 9.toByte)))
JobTest[TestJob.type]
.args("--test-table=test:table.def")
.input(BigQueryIO(Table.Spec("test:table.def")), Seq(row))
.run
}
Returns the following stack-trace (trimmed for relevance):
Caused by: java.lang.IllegalArgumentException: com.google.common.io.BaseEncoding$DecodingException: Unrecognized character: {
at com.google.common.io.BaseEncoding.decode(BaseEncoding.java:219)
at magnolify.bigquery.TableRowField$.$anonfun$trfByteArray$1(TableRowType.scala:172)
at magnolify.bigquery.TableRowField$$anon$4.from(TableRowType.scala:160)
at magnolify.bigquery.TableRowField$$anon$5.from(TableRowType.scala:187)
at magnolify.bigquery.TableRowField$$anon$5.from(TableRowType.scala:183)
at magnolify.bigquery.TableRowField.fromAny(TableRowType.scala:71)
at magnolify.bigquery.TableRowField.fromAny$(TableRowType.scala:71)
at magnolify.bigquery.TableRowField$$anon$5.fromAny(TableRowType.scala:183)
at magnolify.bigquery.TableRowField$$anon$2.$anonfun$from$1(TableRowType.scala:110)
The full String that trfByteArray
is attempting to decode is {empty=false, defined=true}
, rather than the byte array itself. This also happens if None
is passed instead of an array, but does not happen if the field is not populated at all.
I can't repro with the following further minimized blob. So it's possibly something in the JobTest
code.
import com.spotify.scio.bigquery.{BigQueryType, TableRow}
import magnolify.bigquery._
object Test {
def main(args: Array[String]): Unit = {
val bqt = TableRowType[TestType]
val r1 = TestType("hello", Some("world".getBytes))
val tr1: TableRow = TestType.toTableRow(r1)
println(tr1)
val r1a = bqt(tr1)
println((r1a, r1a.bytes.map(new String(_))))
val r2 = TestType("hello", None)
val tr2: TableRow = TestType.toTableRow(r2)
println(tr2)
val r2a = bqt(tr2)
println((r2a, r2a.bytes.map(new String(_))))
}
@BigQueryType.toTable
case class TestType(id: String, bytes: Option[Array[Byte]])
}
GenericData{classInfo=[f], {id=hello, bytes=d29ybGQ=}}
(TestType(hello,Some([B@7d82bbc7)),Some(world))
GenericData{classInfo=[f], {id=hello}}
(TestType(hello,None),None)
Turns out this has nothing to do with either magnolify or scio.
TableRow
here is a pure Java type using Jackson for ser/de, so the bytes field needs to be plain byte array with Some()
.
new TableRow().set("id", "test-id").set("bytes", Some(Array(1.toByte, 9.toByte)))