googleapis/google-cloud-cpp-spanner

Consider: Should Value::op<< STRING be wrapped in quotes?

devjgm opened this issue · 6 comments

This issue is being broken out from #1380.

Background

The behavior today is:

std::cout << Value(std::string("foo"));

This would output "foo" (including the surrounding double quotes). The question is should that include the double quotes or not?

A relevant comment from @coryan #1380 (comment)

And from @devbww #1380 (comment)

Streaming T vs Value(T)

In PR #1387, I added a unit test to compare the output of streaming an object of type T vs Value(T).

// Ensures that the following expressions produce the same output.
//
// `os << t`
// `os << Value(t)`
//
template <typename T>
void StreamMatchesValueStream(T t) {
std::ostringstream ss1;
ss1 << t;
std::ostringstream ss2;
ss2 << Value(std::move(t));
EXPECT_EQ(ss1.str(), ss2.str());
}
TEST(Value, OutputStreamMatchesT) {
// bool
StreamMatchesValueStream(false);
StreamMatchesValueStream(true);
// std::int64_t
StreamMatchesValueStream(-1);
StreamMatchesValueStream(0);
StreamMatchesValueStream(1);
// double
StreamMatchesValueStream(0.0);
StreamMatchesValueStream(3.14);
StreamMatchesValueStream(std::nan("NaN"));
StreamMatchesValueStream(std::numeric_limits<double>::infinity());
StreamMatchesValueStream(-std::numeric_limits<double>::infinity());
// std::string
// Note: `os << std::string{}` != `os << Value(std::string{})`
// because the latter includes surrounding double quotes.
// StreamMatchesValueStream("foo");
// Bytes
StreamMatchesValueStream(Bytes());
StreamMatchesValueStream(Bytes("foo"));
// Date
StreamMatchesValueStream(Date(1, 1, 1));
StreamMatchesValueStream(Date());
StreamMatchesValueStream(Date(9999, 12, 31));
// Timestamp
StreamMatchesValueStream(Timestamp());
StreamMatchesValueStream(MakeTimestamp(MakeTimePoint(1, 1)).value());
// std::vector<T>
// Not included, because a raw vector cannot be streamed.
// std::tuple<...>
// Not included, because a raw tuple cannot be streamed.
}

From this we can see that for all non-aggregate (vector or tuple) types, the following lines produce the same output:

T t = ...;
std::cout << t << "\n";
std::cout << Value(t) << "\n";

Except for std::string. When T is std::string, Value(t) wraps the output in double quotes.

Question

Do we want std::cout << Value(std::string("foo")); to output

  • "foo", or ...
  • foo

Proposal from @devjgm

I'd prefer if Value(std::string) did not wrap the output in double quotes normally. That is, I'd prefer if std::string behaved exactly like all the other non-aggregate values and printed the same whether or not it was wrapped in a Value. I think this would be the behavior expected by most users.

However, this could cause problems with delimiters when printing a std::string that's part of an aggregate, such as Value(std::vector<std::string>{...}). So only when a std::string is printed as part of a vector or tuple, I think in that case it should be wrapped in quotes. In this model, the wrapping of the string in quotes is conceptually part of the formatting of the aggregate, not the string itself.

So to be clear, I propose the following behavior:

std::string s = "foo";
std::cout << s << "\n";
std::cout << Value(s) << "\n";
std::cout << Value(std::vector<std::string>{"foo", "bar", "baz"}) << "\n";

OUTPUT:

foo
foo
["foo", "bar", "baz"]

Thoughts? Other proposals?

Thoughts? Other proposals?

So, I believe I've made my thoughts/other-proposals clear, and I'm sure I'll be outvoted (which is fine), but I can reiterate if you like.

The existence of StreamMatchesValueStream and the addition of new T output operators, has painted us into a corner where your proposal may be the best answer to the question you ask. But I don't think that "StreamMatchesValueStream" need be an axiom.

Instead, Bytes, Timestamp, and Date would not have output operators, just like vector and tuple do not. Rather, the provided accessors would be used to render each type as you see fit. Timestamp and Date streaming may well be something that should be locale specific anyway. And if Bytes streaming exists at all, it should simply stream the bytes, just as string streaming streams the chars.

If we need spanner::Value streaming for "debugging and human consumption" (and I'm happy to say we do), then my take has been to stream a value just like you'd write the corresponding literal in Spanner.

This seems to be just as consistent a world as "StreamMatchesValueStream", but doesn't require addressing any additional questions about quoting (including making it context dependent) or escaping.

If we need spanner::Value streaming for "debugging and human consumption" (and I'm happy to say we do), then my take has been to stream a value just like you'd write the corresponding literal in Spanner.

I think this is the main issue that would push us one way or the other. Once this is decided, I think the other questions are pretty clear.

There are 9 "Spanner types" and they all have a corresponding C++ type:

* Spanner Type | C++ Type `T`
* ------------ | ------------
* BOOL | `bool`
* INT64 | `std::int64_t`
* FLOAT64 | `double`
* STRING | `std::string`
* BYTES | `google::cloud::spanner::Bytes`
* TIMESTAMP | `google::cloud::spanner::Timestamp`
* DATE | `google::cloud::spanner::Date`
* ARRAY | `std::vector<T>` // [1]
* STRUCT | `std::tuple<Ts...>`

An instance of Value is like a bridge between the Spanner and C++ worlds. So it's reasonable to wonder what we're formatting for when streaming a Value. Do we format for the Spanner word, or the C++ world?

I now see that you're pushing for us to favor and format for the Spanner world using Spanner's literal syntax. And I have been pushing the other direction, for us to favor the C++ world, and print things as much as possible as a C++ programmer might expect the C++ type to be formatted.

Here's why I think we should ignore the Spanner formatting/syntax and instead stream Value objects as much as possible like their C++ types.

  1. Our C++ Value class only operates in terms of C++ types. You create it with C++ types. You .get<T>() C++ types. And I think users will naturally expect it to print as much as possible, like the corresponding C++ type.
  2. Clearly, operator<<() will be called in C++ code in a C++ context, so I again, I think producing output that looks like normal streamed C++ output is good.
  3. Users should not use attempt to use operator<<() with Value to construct SQL query strings to be used in Spanner. They should instead use bound params with their queries, which is more secure and doesn't require any formatting. I believe it's a non-goal to make it easy for users to paste together strings to build up an SQL statement.
  4. As our documentation for op<< says, the output is intended for debugging and human consumption. Outputting Spanner's SQL literal syntax is likely not the most useful format (and extra verbose) for this purpose.

If you (@devbww) still favor streaming the value just like one would write the corresponding literal in Spanner, can you give the reasons why that would be better for our users? I'm open to being convinced :-)

@devbww I don't want to rush your reply, because I realize this is likely not the most important issue that you're working on. So please don't feel urgency or pressure on this.

That said, the current behavior of Value streaming is inconsistent, and our string quoting does not escape internal quotes. So I'm going to send a PR to implement the proposal I suggested above, because that PR will at least get us to some consistent behavior for the GA launch.

All the output stream operators are documented with a @warning that the output format is for debugging/human consumption and it "may change without notice". So we are not painting ourselves into a corner -- you're still welcome to convince us that formatting using Spanner literal syntax is preferable, and we can change to that later.

Thanks to @devbww 's and @coryan 's discussion about this.

If you (@devbww) still favor streaming the value just like one would write the corresponding literal in Spanner, can you give the reasons why that would be better for our users? I'm open to being convinced :-)

@devbww I don't want to rush your reply, ...

It seemed to me that the first request was conditional ("if"), so given that I don't believe I can give you convincing reasons, I have nothing to add.

OK. Thanks.