Regression in bytes read safety in avro
gafiatulin opened this issue · 1 comments
Changes introduced in #740 made from
side potentially unsafe for bytes
fields in some cases.
For example whenever GenericDatumReader[GenericRecord].read
is called over multiple elements with GenericRecord
value for reuse, underlying ByteBuffer
s are being reused as well, so bytes in the fields are also overwritten.
Short example (click to expand)
import magnolify.avro._
import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord}
import org.apache.avro.io.{DecoderFactory, EncoderFactory}
import java.io.ByteArrayOutputStream
import scala.util.Random
case class Test(bytes: Array[Byte])
val avroType = AvroType[Test]
def toBytes(values: List[Test]): Array[Byte] = {
val bytes = new ByteArrayOutputStream()
val datumWriter = new GenericDatumWriter[GenericRecord](avroType.schema)
val binaryEncoder = EncoderFactory.get().binaryEncoder(bytes, null)
values.foreach { value =>
datumWriter.write(avroType.to(value), binaryEncoder)
}
binaryEncoder.flush()
bytes.toByteArray
}
def fromBytes(bytes: Array[Byte], avroType: AvroType[Test]): List[Test] = {
val datumReader = new GenericDatumReader[GenericRecord](avroType.schema)
val decoder = DecoderFactory.get().binaryDecoder(bytes, null)
var reuseRecord: GenericRecord = null
List.unfold(()) { _ =>
if(decoder.isEnd) None
else {
reuseRecord = datumReader.read(reuseRecord, decoder)
Some(avroType.from(reuseRecord), ())
}
}
}
val values = List.fill(2)(Test(Random.nextBytes(5)))
val bytes = toBytes(values)
val roundTrip = fromBytes(bytes, avroType)
assert(values.size == roundTrip.size)
values.zip(roundTrip).foreach{ case (expected, obtained) =>
assert(expected.bytes.sameElements(obtained.bytes), "Round-trip failure")
}
One workaround is to to not reuse record when reading, but sometimes there's no control over that.
Another – provide custom implicit val afByteBuffer: AvroField[ByteBuffer]
or implicit val afBytes: AvroField[Array[Byte]]
when declaring AvroField
/AvroType
.
Previously val afBytes: AvroField[Array[Byte]]
used to copy byte array, which was arguably safer behavior.
Hi @gafiatulin, sorry for coming back to you that late.
Thanks for the report, we'll revert to previous behavior