googleapis/cloud-bigtable-cbt-cli

Support UTF-8 strings for `read` and `lookup` outputs while using `ProtocolBuffer` encoding

jaeyeol-moloco opened this issue · 2 comments

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Is your feature request related to a problem? Please describe.
When I use ProtocolBuffer encoding, I'm frustrated by string value encoded in unreadable bytes.
For example, a Korean string "경동나비엔" is printed as "\352\262\275\353\217\231\353\202\230\353\271\204\354\227\224".

Describe the solution you'd like
I would like cbt to support UTF-8 string in read or lookup output.

Describe alternatives you've considered
I found that the unreadable byte sequence is from message.MarshalTextIndent()(link). If we use message.MarshalJSONIndent() instead, a UTF-8 string can be correctly printed like "경동나비엔". So it would be also good if cbt allows users to choose prototext or protojson as the output format. Then prototext will still output bytes in octal, but I can choose protojson to see UTF-string.

Additional context

I think octal outputs for UTF-8 characters are intended according to https://protobuf.dev/reference/protobuf/textformat-spec/. So fixing the output for ProtocolBuffer format wouldn't be an option for this issue. In #171, I added one more format ProtocolBufferJSON for marshaling a protocol buffer value in JSON format which prints UTF-8 strings normally.

I realized that text format spec itself supports UTF-8, so I closed #171 and open a new PR #172 which changes the formatter package from https://github.com/jhump/protoreflect to https://pkg.go.dev/google.golang.org/protobuf/encoding/prototext.