mjakubowski84/parquet4s

Protobuf enums deserialisation

Closed this issue · 3 comments

Hi, first I'd like to thank you for a fantastic library and I appreciate the scalapb plugin is just in experimental.

When deserialising enums, the output comes out as ASCII, is there a way to just use the name?

{"timestamp":1700741449422,"loginEvent":{"event_type":{"0":82,"1":69,"2":70,"3":82,"4":69,"5":83,"6":72,"7":95,"8":83,"9":85,"10":67,"11":67,"12":69,"13":83,"14":83,"15":70,"16":85,"17":76}}}

I am using:

    "com.github.mjakubowski84" %% "parquet4s-fs2" % "2.14.1",
    "com.github.mjakubowski84" %% "parquet4s-scalapb" % "2.13.0",

And my .proto file looks a bit like this:

message LoginEvent {
    enum EventType {
        LOGIN_EVENT_UNSPECIFIED = 0;
        LOGIN_SUCCESSFUL = 1;
    }
}

And it generates this:

    case object LOGIN_SUCCESSFUL extends EventType(1) with EventType.Recognized {
      val index = 1
      val name = "LOGIN_SUCCESSFUL"    // would like to just use this
      override def isLoginSuccessful: _root_.scala.Boolean = true
    }

This looks strange, indeed. It looks like a bug.

Hi @struong

Given proto

syntax = "proto3";

option java_package = "com.github.mjakubowski84";

message LoginEvent {
  enum EventType {
    LOGIN_EVENT_UNSPECIFIED = 0;
    LOGIN_SUCCESSFUL = 1;
  }

  int64 timestamp = 1;
  EventType eventType = 2;

}

and Scala code:

package com.github.mjakubowski84

import com.github.mjakubowski84.parquet4s._
import ScalaPBImplicits._

import java.time.Instant

object PBTest extends App {

  val outFile = InMemoryOutputFile(initBufferSize = 4800)
  ParquetWriter.of[LoginEvent].writeAndClose(
    file = outFile,
    data = Seq(
      LoginEvent(
        timestamp = Instant.now().toEpochMilli,
        eventType = LoginEvent.EventType.LOGIN_SUCCESSFUL
      )
    )
  )

  val inFile = InMemoryInputFile.fromBytes(outFile.take())

  val protoResult = ParquetReader.as[LoginEvent].read(inFile)
  // prints LoginEvent(1701085450638,LOGIN_SUCCESSFUL,UnknownFieldSet(Map()))
  println(protoResult.head)

  val genericResult = ParquetReader.generic.read(inFile)
  // prints Some(LOGIN_SUCCESSFUL)
  println(genericResult.head.get[String]("eventType", ValueCodecConfiguration.Default))

}

The created record is RowParquetRecord(timestamp=LongValue(1701085521641),eventType=BinaryValue(Binary{16 constant bytes, [76, 79, 71, 73, 78, 95, 83, 85, 67, 67, 69, 83, 83, 70, 85, 76]})) with eventType being a binary encoding a String value of the enum.

Everything looks good to me with enums and deserialisation.

It seems that you have a bug somewhere else.

@mjakubowski84 thank you for your prompt reply!

I believe it's an issue with my VSCode plugin to read parquet files, I used an online reader and indeed the rows look good. Thank you for investigating!