xitongsys/parquet-go

Library may be trying to convert the byte array into a string representation instead of preserving the raw byte data

Opened this issue · 2 comments

I want to store the following data structure in the Parquet file format:

type FlowInfo struct {
	FirstSeen     int64    `parquet:"name=FirstSeen, type=INT64"`
	RemIP         string   `parquet:"name=RemIP, type=BYTE_ARRAY"`
	Proto         int32    `parquet:"name=Proto, type=INT32"`
	DevPort       int32    `parquet:"name=DevPort, type=INT32"`
	RemPort       int32    `parquet:"name=RemPort, type=INT32"`
	TotalFlowSize int64    `parquet:"name=TotalFlowSize, type=INT64"`
	PacketCount   int64    `parquet:"name=PacketCount, type=INT64"`
	Content       [100]byte `parquet:"name=Content, type=BYTE_ARRAY"`
}

Here, I need to store raw bytes in the Content field. However, when I store the file and read it, the value of the content is written as b'<[100]uint8 Value>' instead of the actual raw bytes. It appears that the library tries to obtain the string representation of the byte array instead of dumping the actual content.

How can I solve this issue?

knl commented

I see that https://github.com/AppliedIntuition/parquet-go has support for byte slices, but haven't tried it yet

It's been raised by people in the past several years and got no response so i guess this will not be supported.

#321
#434
#453
#514