Support UTF-8 strings for `read` and `lookup` outputs while using `ProtocolBuffer` encoding
jaeyeol-moloco opened this issue · 2 comments
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Is your feature request related to a problem? Please describe.
When I use ProtocolBuffer
encoding, I'm frustrated by string value encoded in unreadable bytes.
For example, a Korean string "경동나비엔" is printed as "\352\262\275\353\217\231\353\202\230\353\271\204\354\227\224".
Describe the solution you'd like
I would like cbt
to support UTF-8 string in read
or lookup
output.
Describe alternatives you've considered
I found that the unreadable byte sequence is from message.MarshalTextIndent()
(link). If we use message.MarshalJSONIndent()
instead, a UTF-8 string can be correctly printed like "경동나비엔". So it would be also good if cbt
allows users to choose prototext
or protojson
as the output format. Then prototext
will still output bytes in octal, but I can choose protojson
to see UTF-string.
Additional context
I think octal outputs for UTF-8 characters are intended according to https://protobuf.dev/reference/protobuf/textformat-spec/. So fixing the output for ProtocolBuffer
format wouldn't be an option for this issue. In #171, I added one more format ProtocolBufferJSON
for marshaling a protocol buffer value in JSON format which prints UTF-8 strings normally.
I realized that text format spec itself supports UTF-8, so I closed #171 and open a new PR #172 which changes the formatter package from https://github.com/jhump/protoreflect to https://pkg.go.dev/google.golang.org/protobuf/encoding/prototext.