Horribly inconsistent behavior between const/non-const reference in operator [] ()
kawing-chiu opened this issue · 29 comments
Things like
std::string tag;
try {
tag = json["tag"];
} catch (const std::domain_error &e) {
// handle it
}
works nicely and as expected. However, if json
is a const reference, the above code will fail by assertion. Why is it designed like so? Shouldn't it be better and more consistent to at least throw a domain_error
instead of fail by assertion? It took me more than one hour to debug this issue when I pass the above json by const reference to another function to extract data...
Neither version of operator[](key)
checks if the given key exists and only throw std::domain_error
when called on a JSON value that is not of type object. Therefore, your code will - in both cases - only catch exceptions indicating that json
is not a JSON object. The behavior differs in case key
is not stored in the object:
- The nonconst version (1) silently converts a null value to an empty object, and (2) creates a null value at
key
if it does not exist before it returns a reference to the value atkey
. This reference is always valid, because of step (2): there will always be a value atkey
. - The const version returns a const reference to the value at
key
. If there is no value atkey
, we can neither alter the object nor return a "null reference". The behavior is undefined in this situation (basically, the code is equivalent toreturn *object.find(key);
). An assertion checks this.
I though about the semantics for quite a while as there is no similar operator[]
for std::map
returning a const reference. However, there is at
for checked access, so operator[]
is for the unchecked access. Checking whether key
exists in the const version would result in overhead which would contradict the "not pay what you don't use" principle.
I would be happy for ideas how to improve this situation.
Hello!
I think I have a solution how to
at least throw a
domain_error
In that metod (string 3616):
template<typename T>
const_reference operator[](T* key) const;
We should simply change this:
assert(m_value.object != nullptr);
assert(m_value.object->find(key) != m_value.object->end());
return m_value.object->(key)->second;
to something like this:
assert(m_value.object != nullptr);
if(m_value.object->find(key) == m_value.object->end()) {
throw std::domain_error("something");
}
return m_value.object->find(key)->second;
It works pretty nice for me :-)
That's exactly what nlohmann said he doesn't want to do
Checking whether key exists in the const version would result in overhead which would contradict the "not pay what you don't use" principle.
Well, I think this problem is not too hard, but it might require some API change. To see what is the appropriate behavior here, let's learn from how others do it.
Since JSON originates from and is heavily used by high-level scripting languages, we consider Python and Javascript's handling of json here (these two languages even have json functionalities built into the language, and these functionalites work intimately with built-in data structures).
[Javascript]
Here is the behavior of Javascript object
, which maps to JSON objects:
-
Reading. Reading a non-exist key does not throw exception, but returns
undefined
. But nothing is inserted into the object when reading:>> o = {} Object {} >> o.x undefined >> o Object {}
-
Writing. Writing to non-exist key works as expected:
>> o.x = {123: '321'} Object {123: "321"} >> o.y = null null >> o Object {x: Object, y: null}
-
Note that Javascript has two different built-in types:
undefined
andnull
.undefined
means variable does not exist.null
maps to JSON null.
[Python]
Here is the behavior of Python dict
, which maps to JSON objects:
-
Reading. Reading a non-exist key throws
KeyError
exception, and the dict itself is of course not touched:In [1]: d = {} In [2]: d['x'] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-2-58142e8c3daa> in <module>() ----> 1 d['x'] KeyError: 'x' In [3]: d Out[3]: {}
-
Writing. Writing to non-exist key works as expected:
In [3]: d Out[3]: {} In [4]: d['x'] = {'10': 100} In [5]: d['y'] = None In [6]: d Out[6]: {'x': {'10': 100}, 'y': None}
-
Python does not have equivalent of Javascript's
undefined
. Python'sNone
type maps to JSONnull
.
Note that on reading, both languages don't touch the object at all.
Now return to the C++ world. I think these two principles are worth following:
- Const and non-const behavior should definitely match.
- A production quality library should really get rid of assertions. See the C++ standard library for an example. Grep
assert
in the source of clang's libcxx and you will find that assertions are only used when exception is not available or very sparingly for internal debug purpose.
So from the above discussions, IMHO the best solution is to mimic Python's behavior, which is well tested, widely used and already familiar by many programmers. There's really no need to invent a new behavior. We can just define some exceptions like key_error
or something. (In Python's case, KeyError
is a built-in standard exception. But there's no reason that we must stick to built-in exceptions. For example, <future>
defines a bunch of exceptions for use by std::future
.) The reasons for using an exception on reading a non-exist key include:
- Reading a non-exist key is indeed an exceptional case. Which maps well to exception conceptually.
- Using exception avoids the overhead of doing unnecessary tests. JSON is a highly dynamic data format. A common use case of a json library is on a server that processes high-throughput json requests from the outside world. In this case, 99.99% percent of the requests are just valid. Every field exists as expected. The remaining 0.01% are some malformed jsons, maybe just several out of tens of thousands a day. So we certainly don't want to test every field for existence and type conformance for the 99.99% requests. And this is the exact reason that why I abandon rapid_json and come to nlohmann/json.
- Note that in the current status of the library, if assertion error/UB may occur, this amounts to force the user of the library to write all of the tests on all of the json fields. So we come to the point again: assertions should really be eliminate in favor of exceptions in any production-ready libraries.
So, to sum up, IMHO, I think we really should follow the behavior of Javascript or Python. I would just choose Python's. Or, we can implement both and provide a global config option to let the user choose which flavor to use.
avoids the overhead of doing unnecessary tests. JSON is a highly dynamic data format. A common use case of a json library is on a server that processes high-throughput json requests from the outside world. In this case, 99.99% percent of the requests are just valid. Every field exists as expected. The remaining 0.01% are some malformed jsons, maybe just several out of tens of thousands a day. So we certainly don't want to test every field for existence and type conformance for the 99.99% requests. And this is the exact reason that why I abandon rapid_json and come to nlohmann/json.
This is exactly why the test isn't there.
Note that in the current status of the library, if assertion error/UB may occur, this amounts to force the user of the library to write all of the tests on all of the json fields. So we come to the point again: assertions should really be eliminate in favor of exceptions in any production-ready libraries.
The assertions are removed when you build with NDEBUG, so there are no assertions when building for production. They're debugging aids. Therefore, ignore them.
If the user of the library needs to know whether or not the desired field is there because it's reading from a const object constructed from an unknown source, then it is up to the user of the library to do the test. Adding this test makes the 99.99% slower so that the 0.01% can be exactly the same speed, but with the test in the library instead of outside the library.
Const and non-const behavior should definitely match.
That is simply not possible. It would mean that you could not do this:
json j;
j["foo"] = "bar";
@gregmarr This is not what I mean. I mean 'the const part behavior' should match, of course. In our specific case, it means the behavior on reading.
Removing the assertion by such debug option is pointless, since what you get instead is undefined behavior, it's essentially the same. Failed with undefined behavior is even worse IMHO. Have you read through my post? I did not advocate adding tests at all. Where did you get that impression from?
I confused your post with dka8's post.
What exactly are you proposing then?
@gregmarr I propose to leave everything as is:
at
for checked accessoperator[]
for unchecked access
@kawing-chiu I am aware how other languages cope with JSON, and the motivation for this project was "What if JSON was part of Modern C++?". Where possible, I tried to make JSON feel like a part of the STL and basically combined the interfaces of std::vector
and std::map
. Both offer with at
checked access to the elements. std::vector
also has operator[]
for both const and nonconst access - in both cases without bounds checking. As std::map
has no const operator[]
, I basically had three choices:
- not to offer a const
operator[]
- copy the behavior of
at
and provide checked access - mimic the behavior of
vector
and provide unchecked access
Option 1 makes no sense. There a so many Stack Overflow threads about people being confused that they cannot use operator[]
for const std::map
that I did not follow this.
Option 2 makes more sense, but would always check for an element even if such a check is not needed. This, in my opinion, is not very C++ like.
Option 3 is documented as such. Just like dereferencing the past-the-end iterator or using an invalid vector index, the behavior is undefined. One could argue that calling find
, checking whether it's end()
and returning the reference if not does not really add overhead, but in the end it is a check that no one really asked for when using operator[]
rather than at
.
About assertions: I shall think about switching them off by default and only to switch them on with a special preprocessor symbol. However, the current code only use assertions in cases of otherwise undefined behavior.
There is no need for a special preprocessor symbol, assert comes with one as part of the standard, as I mentioned.
I agree with leaving things as is.
@gregmarr I agree. I think the main issue of @kawing-chiu is that they did not expect an assertion in this case. I do mention assertions in the README - I am not sure what I can do to avoid such confusions.
@nlohmann OK, now I see why. Actually, I think Option 1 is the best of three and it makes some sense. If we're to follow STL's lead, then just follow as close as possible...
Another possibility I can think of is to define subclass of std::logic_error
, throw it and let the user know what happen instead of the assertion error.
Either way, I think there should be a section dedicated to it in the readme.
I think providing a function in the flavor of std::vector
and offer unchecked access also for objects is more consistent than not allowing const access with operator[]
at all. Anyway, I think the larger issue are the assertions. I don't think throwing an exception is a solution, because this again would introduce a check...
There is currently a notes section in the README, reading
- The code contains numerous debug assertions which can be switched off by defining the preprocessor macro
NDEBUG
, see the documentation ofassert
.
What would you like to add?
@nlohmann Exceptions are not like checks. They are zero cost on the 'normal path'. And unlike assertions they can be catched and won't make your server crash.
The problem is not at all whether we can turn assertions into undefined behaviors by NDEBUG
.
As an example: std::promise::get_future()
uses various kinds of future_error
exceptions to indicate something that should not be done by the user (e.g. the future associate with the promise can only be retrieved once). The std::promise::get_future()
function will not just irresponsibly assert and dead when the user access the future twice.
With
I don't think throwing an exception is a solution, because this again would introduce a check...
I meant that one would still need code like
if (m_object->find(key) != m_object->end())
throw std::logic_error(key + " not found");
Or did you mean to have a different assertion macro that throws, so that check would not be executed in production?
I don't understand the point. You already have it in your assert()
assert(m_value.object->find(key) != m_value.object->end());
So isn't it just the same in terms of overhead??
OK, so you meant this:
Or did you mean to have a different assertion macro that throws, so that check would not be executed in production?
:)
Ok, so you propose to replace the vanilla assert
by a function that throws some exception instead. And this function would be a no-op in production, right?
Well, not exactly. I mean we use the exception for both development and production, like std::promise::get_future()
.
But since there's at()
. What really needs to be done I think is to add a section to readme to clarify things and warn the user. At least say explicitly that [] may assert.
The code contains numerous debug assertions which can be switched off by defining the preprocessor macro NDEBUG, see the documentation of assert.
is not quite enough. 'There is a minefield'. But where are the mines?
Option 1 makes no sense. There a so many Stack Overflow threads about people being confused that they cannot use operator[] for const std::map that I did not follow this.
C++ developers are still likely to work with const std::map
and I don't think that using a different approach in this library makes them any less confused. I found a few caveats from my code where operator[]
was used with const objects and I'd much rather prefer compiler errors in such cases (and consistency with std::map
).
std::vector also has operator[] for both const and nonconst access - in both cases without bounds checking.
At least the behavior of both operators is consistent as neither of them can be used for inserting new elements.
I agree with @maksis . I think that is the exact reason why std::map
does not have a const operator[]
.
The assertions are now mentioned in the README and the documentation of the function. Furthermore, the release notes for 2.0.3 will also mention the existence of assertions in the project and how to disable them.
I know I'm late to the discussion here but I wanted to add some thoughts about this. I understand that the idea is that since const operator[]
is unchecked for arrays, it should be fine to make it unchecked for maps too. But I don't think that's true. There are many reasons why operator[]
is unchecked for arrays. But those reasons do not apply to maps.
First of all, indexing into an array is fundamentally a very cheap operation, possibly a single processor instruction. Adding a bounds check to it makes the operation many times slower. By contrast, looking up an entry in a string map is already an expensive operation involving O(log n) string comparisons. Doing an extra it != container.end()
check only makes it negligibly slower.
Second of all, the most common operation with input arrays is to iterate through them and access each element. With this usage pattern, it's easy to ensure that there is no out-of-bounds access. By contrast, with input maps, random access by key is more common than iteration. And with random access it's a lot more likely that out-of-bounds access is a real possibility that you have to deal with.
Third of all, everyone is already very used to the fact that indexing into arrays is unchecked, ever since the days of C. There is no corresponding expectation for maps. I think that when people are writing code using json objects in C++, they might expect one of 2 things:
- json objects are like std::map, i.e. there is no const
operator[]
- json objects are like dicts in Python, etc, i.e.
operator[]
throws for keys that aren't found
I really don't think that people's first thought would be that the behavior of operator[]
for json objects is the same as it is for arrays, i.e. undefined behavior for out-of-bounds access. That, to me, is a fundamentally surprising thing about the interface of this library. Of course you can try to remedy the problem of a surprising interface with documentation, but it's still a problem.
I don't agree with the concern that it violates "pay only for what you use". Like I mentioned earlier, the cost of the check is negligible compared to the existing cost of the indexing operation. It's a small price to pay for eliminating a source of undefined behavior which, in my opinion, an inexperienced user is very likely to run into. Someone said that this check would make the 99.99% case slower for the sake of the 0.01% case. I strongly disagree with that analysis. How can it be that in 99.99% of cases, people are so sure about the source of their input that they're willing to invoke undefined behavior if they're wrong? In 99.99% of cases, the json that you're reading came from some external source, whether a config file or some other process. You really want to give that external source the power to cause a segfault in your application? I think in 99.99% of cases, people are going to want checked access. Or, if they're an inexperienced developer, they won't bother to do any checking thinking they don't need it, until the day that they're wrong and now they have a segfault to deal with.
In conclusion, what I want is for the const version of operator[]
to have the same behavior as at
, because operator[]
has the nicer, more obvious syntax, so it should be the safe version. Adding this check would even be a backwards-compatible change because you're taking something that's undefined behavior now and making it defined behavior.
While we're here, another possible behavior for const operator[]
would be to have an empty instance of basic_json as a static variable in the class, and have const operator[]
return a const reference to this instance if the lookup fails. This would be very useful for reading nested configs where all the fields are optional, e.g.:
void loadConfig(const json& config) {
const json& child = config["foo"];
a_ = child.value("a", 0);
b_ = child.value("b", true);
c_ = child.value("c", "bar");
}
In this case, since all the fields inside of "foo" are optional, we can make "foo" itself be optional.
But probably the less surprising behavior for const operator[]
is to throw. The above behavior of returning a const reference to a dummy instance is useful in its own right so it could be added as a new method, say child(key)
.
Adding a bounds check to it makes the operation many times slower
Actually, even for arrays it is only marginally slower even for cache-friendly sequential access on modern architectures (pipelining and all that), and is essentially free for not-cache-friendly random-access.
I have to agree that undefined behavior should not be introduced without really good reasons, and if can be eliminated for extremely negligible cost, that should be made so.
On that note, I'm a believer that assert
s should be reserved for testing preconditions that are internal to the library, rather than for checking validity of external inputs that we have no control over - that's what domain_error
s are for!
assert(does_not_pose_sql_injection_threat(user_input))
; )
@eferreira The value()
function) seems similar to your proposed child
function, right?
The value function returns a copy, right? The child function would return a const reference. I'd like a function that returns a reference to a child if it's there, otherwise it returns a reference to some dummy instance. Is it possible to achieve that with the value function?
Right, it returns a copy.