eddelbuettel/rprotobuf

Exposing TextFormat to the R API

MichaelChirico opened this issue · 6 comments

Currently, RProtoBuf exposes the DebugString through toString() and as.character() methods, but there is no way to convert to the TextFormat representation.

Until now, this hasn't mattered much as the two formats are basically identical, but we recently started emphasizing the difference between the two as they're expected to diverge more going forward.

Here's the doc string that will be attached to DebugString() once GitHub syncs with our internal sources in message.h:

  // Generates a human-readable form of this message for debugging purposes.
  // Note that the format and content of a debug string is not guaranteed, may
  // change without notice, and should not be depended on. Code that does
  // anything except display a string to assist in debugging should use
  // TextFormat instead.
  std::string DebugString() const;

https://github.com/protocolbuffers/protobuf/blob/eb8976a56d3f596ada8562dcff27724fb3f3625c/src/google/protobuf/message.h#L305-L307

Currently, we have an internal patch creating a new toTextFormat() method to expose TextFormat() which basically wraps the corresponding C++ method

// in wrapper_Message.cpp
/**
 * Return the TextFormat of message
 *
 * @param xp external pointer to the Message
 */
RPB_FUNCTION_1(std::string, METHOD(print_text_format),
               Rcpp::XPtr<GPB::Message> message) {
  std::string message_text;
  GPB::TextFormat::PrintToString(*message, &message_text);
  return message_text;
}

However I am wondering whether it makes sense to switch things around so that toString() returns the TextFormat representation and we add a toDebugString() method to expose that, if necessary.

Happy to file the PR whichever way.

A priori, either sounds good to me, Minimising API / interface changes is always preferable so maybe a 'net addition' may be cleaner than a change?

maybe a 'net addition' may be cleaner than a change

(by which I assume you mean the new toTextFormat() method approach)

Definitely agreed there, and it's why our initial patch does that.

OTOH my intuition is that what most people want from toString() is actually closer to what TextFormat is supposed to be. And if the APIs do start diverging, we might expect some peoples' usage to start breaking later, so better to make the split now while the two representations are basically the same?

WDYT?

Just thinking out loud here: "Both"? Offer the new approach of changing toString if an opt-in toggle is set?

That's the other idea that came up. So far toTextFormat() was chosen because it has the smallest diff -- not really a globally optimal strategy :)

Adding options to toString() is probably the user-friendliest approach -- I guess most R-level users don't really care to know the difference between DebugString and TextFormat. But a well-named option could be good, e.g. x$toString(debug = TRUE).

Still the big question is what the default should be -- any thoughts on whether the breaking* change is the right idea?

*unclear the extent to which anything is actually "broken" as of now

Current defaults? And I was thinking a possible options() (or alike) to set a preference at startup / set a default for the function.

Does that make sense? Basically just being careful about the status while allowing the in-house improvement from your end in?

Ok that makes sense. certainly enough to get a PR started. let's continue from there.