Protobuf enums deserialisation
Closed this issue · 3 comments
Hi, first I'd like to thank you for a fantastic library and I appreciate the scalapb plugin is just in experimental.
When deserialising enums, the output comes out as ASCII, is there a way to just use the name?
{"timestamp":1700741449422,"loginEvent":{"event_type":{"0":82,"1":69,"2":70,"3":82,"4":69,"5":83,"6":72,"7":95,"8":83,"9":85,"10":67,"11":67,"12":69,"13":83,"14":83,"15":70,"16":85,"17":76}}}
I am using:
"com.github.mjakubowski84" %% "parquet4s-fs2" % "2.14.1",
"com.github.mjakubowski84" %% "parquet4s-scalapb" % "2.13.0",
And my .proto
file looks a bit like this:
message LoginEvent {
enum EventType {
LOGIN_EVENT_UNSPECIFIED = 0;
LOGIN_SUCCESSFUL = 1;
}
}
And it generates this:
case object LOGIN_SUCCESSFUL extends EventType(1) with EventType.Recognized {
val index = 1
val name = "LOGIN_SUCCESSFUL" // would like to just use this
override def isLoginSuccessful: _root_.scala.Boolean = true
}
This looks strange, indeed. It looks like a bug.
Hi @struong
Given proto
syntax = "proto3";
option java_package = "com.github.mjakubowski84";
message LoginEvent {
enum EventType {
LOGIN_EVENT_UNSPECIFIED = 0;
LOGIN_SUCCESSFUL = 1;
}
int64 timestamp = 1;
EventType eventType = 2;
}
and Scala code:
package com.github.mjakubowski84
import com.github.mjakubowski84.parquet4s._
import ScalaPBImplicits._
import java.time.Instant
object PBTest extends App {
val outFile = InMemoryOutputFile(initBufferSize = 4800)
ParquetWriter.of[LoginEvent].writeAndClose(
file = outFile,
data = Seq(
LoginEvent(
timestamp = Instant.now().toEpochMilli,
eventType = LoginEvent.EventType.LOGIN_SUCCESSFUL
)
)
)
val inFile = InMemoryInputFile.fromBytes(outFile.take())
val protoResult = ParquetReader.as[LoginEvent].read(inFile)
// prints LoginEvent(1701085450638,LOGIN_SUCCESSFUL,UnknownFieldSet(Map()))
println(protoResult.head)
val genericResult = ParquetReader.generic.read(inFile)
// prints Some(LOGIN_SUCCESSFUL)
println(genericResult.head.get[String]("eventType", ValueCodecConfiguration.Default))
}
The created record is RowParquetRecord(timestamp=LongValue(1701085521641),eventType=BinaryValue(Binary{16 constant bytes, [76, 79, 71, 73, 78, 95, 83, 85, 67, 67, 69, 83, 83, 70, 85, 76]}))
with eventType
being a binary encoding a String value of the enum.
Everything looks good to me with enums and deserialisation.
It seems that you have a bug somewhere else.
@mjakubowski84 thank you for your prompt reply!
I believe it's an issue with my VSCode plugin to read parquet files, I used an online reader and indeed the rows look good. Thank you for investigating!