microsoft/GSL

2fc94db made gsl::not_null constructor explicit, why?

kivadiu opened this issue ยท 31 comments

Since 2fc94db, this program does not work anymore without explicit cast to not_null (here tested with clang 5.0.2 on linux with -std=c++14):

#include <gsl/pointers>
#include <iostream>

void f(gsl::not_null<const int *> i) {
  std::cout << *i << '\n';
}

int main() {
  int i = 42;
  f(&i); // no matching function for call to 'f'
         // note: candidate function not viable: no known conversion from 'int *' to 'gsl::not_null<const int *>' for 1st argument
  return 0;
}

If this is really wanted, could you explain why?

Kind regards

I guess you already checked the discussions pointed by the commit you gave: #659 and #395
It is true that the specific example you give here does not gain on security nor readability with this change. Maybe a constructor of not_null taking reference as parameter would help ?

I have tried the following program with/without explicit constructor of gsl::not_null. I get exactly the same behaviour at runtime. The only difference is at compile time where I must add a lot of text if the constructor is explicit: gsl::not_null<const int*>(...). It is so painful that it discourages to use gsl::not_null. Could you elaborate on the real usefulness of this explicit constructor? It does not seem to me that the check is done earlier because the construction of the gsl::not_null is always done before the function is called, at the point of the call.

#include <gsl/pointers>
#include <iostream>

void f(gsl::not_null<const int *> i) {
  std::cerr << static_cast<const void *>(i.get());
  std::cerr << ": " << *i << '\n';
}

int *never_null_int() {
  return new int{42};
}

int *sometimes_null_int(int choice) {
  return choice == 0 ? nullptr : new int{43};
}

int main() {
  f(gsl::not_null<const int *>(never_null_int()));
  f(gsl::not_null<const int *>(sometimes_null_int(1)));
  f(gsl::not_null<const int *>(sometimes_null_int(0)));
  return 0;
}

I am also surprised by this change - the whole point of not_null to me was an easy drop-in contract check encoded in function signatures that informed the caller that:

  • hey, I expect you play nice and give me decent pointer, otherwise I will bail-out

Thanks to this you didn't have to put boilerplate if'checks or asserts in the function body just to check parameter validity.

But from the caller perspective this was transparent so it could be also used to improve already existing code, now suddenly everything breaks because it requires explicit casts that have no real benefit to them.

So unfortunately, for me, this was an instant revert.

I can understand that this change will cause existing users some irritation as they must update callsites. However, the point of this change - as per the guideline that recommends it - is to make it more explicit at callsites that you are passing a pointer to a function that will enforce it to be not-null.

@kivadiu: this is not about changing runtime behavior, it is about making the actions that the compiler must enforce clearer to readers of the code. "Converting" constructors (such as the non-explicit form) are difficult for maintainers to reason about when reading through code as they are hidden. They can also participate in conversion sequences in ways that surprise users.

Making these constructor calls explicit, in my opinion, also lines up with the "pay only for what you use" ethos the Guidelines encourage. This way callers to functions that take not_null parameters can see very clearly the constructor calls (and hence nullness checks) that they have in their codebase. When you rely on implicit construction/conversion, you are hiding requests for work to be done.

I am still not convinced. The guidelines about that topic are the following:

All examples are about function arguments, never about function call point.

Even if I am in favour of coming back to the non explicit constructor for gsl::not_null, I think that if explicit must remain, the minimum would be to provide a function like the one below so that it is not necessary to provide the type (f(gsl::not_null<T*>(p)) would become f(gsl::make_not_null(p))). For T* it is quite the same but for const int * and others, it saves typing and the need to find the exact type.

template <class T>
auto make_not_null(T&&t) {
    return gsl::not_null<std::remove_cv_t<std::remove_reference_t<T>>>{t};
}

@kivadiu You also need to consider the guidelines that strongly suggest avoiding implicit, single-argument constructors: C.164: Avoid conversion operators.

But this is not a design created purely by looking at guidelines in some abstract vacuum. The types in the GSL were developed based on the experience of working with large, real-world codebases.

Codebases that are forced to use an explicit constructor are also forced to rethink just how many unnecessary, previously implicit constructions they had (or would otherwise have), and as a result, they can tighten the contracts on their function interfaces. The work seems mildly annoying at first, sure, but results in code that is better-specified and easier to maintain.

The function you provided seems a simple and helpful thing. I have a suspicion (that I haven't tested, so feel free to shoot me down) that template deduction guides in C++17 and beyond will allow you to skip the type while using the explicit constructor anyway, so a call could look like: f(gsl::not_null{p}).

template deduction guides in C++17 and beyond will allow you to skip the type while using the explicit constructor anyway, so a call could look like: f(gsl::not_null{p}).

Yes I have seen that somewhere so I think you are be right but I am still in C++14...

I have also seen C.164 and I apply it in my code but I still do not know if this should really apply to gsl::not_null. I think it would be nice to discuss this on the CppCoreGuidelines github.

C.164 says:

By "serious need" we mean a reason that is fundamental in the application domain (such as an integer to complex number conversion) and frequently needed. Do not introduce implicit conversions (through conversion operators or non-explicit constructors) just to gain a minor convenience.

I wonder if it is not fundamental to have the implicit conversion here and if we do not touch here the "frequently needed" case.

But having the explicit means that not_null will never be applied "automatically", without thinking first. So you may be right.

@kivadiu My 2 cents, as I am currently experiencing the change myself:

  • We need a constructor from a reference (would love to see a PR, btw!)
  • c++17 is better than c++14 . As Neil mentioned, f(gsl::not_null{p}) and gsl::notnull p{&i} work.
  • Depending on how much we want to support only the use of latest c++, we might consider having make_not_null(p)
  • it is a very annoying change indeed, but after spending some time fixing the compilation breaks you realise how many times you had a useless conversion from raw pointer to not_null and back in you callchain, and makes you propagate notnulls up the call chain, remove asserts, make code more readable and intentions clear.
  • for folks who can not accept that, we might consider a macro... or maybe they can fork?

I'll just add some more feedback.

I'm strongly against this change.

  1. This is very verbose and adds boilerplate (even in C++17) for no reason in two very common and valid cases:
  • providing an address of an existing object to a function call, f(&a) -> f(not_null{ &a })
  • passing this to a function call, f(this) -> f(not_null{ this })
  1. Some other cases are fine now, but will require a lot of those not_null{ .. } or make_not_null(..), like having a std::unique_ptr object in some place (as a class member, in a container, etc) and you want to pass it down the stack as a not_null. Now you just call .get like always. And even the ability to make not_null<std::unique_ptr> won't help you, because you can't get not_null<raw> from it in any way.

  2. Adoption of the library and starting of not_null usage gets way worse. Now you can't just slowly add not_null-s in the code and remove your checks inside functions to make them cleaner. You must also go through all places your function is called and inject a conversion. This can be really problematic, but all you wanted to do was to move checks from inside a function to its interface and its contract.

  3. Such huge breaking changes will definitely slow down any mass adoption in existing large projects that GSL can have.

I did start using GSL and specifically not_null a lot in my project Telegram Desktop and it helped me a lot. But now if I want to update the GSL to the latest version I need to fix about 2.5K errors throughout the project. So for me the only way to stay onboard right now is to stick with v1.0.0 version and after that perhaps to fork out my own not_null that I'll use instead of gsl::not_null. That's not a big problem, but I don't think it is good.

@annagrin

I think implicit conversion int& -> not_null<int*> will be strange in terms of readability, I can't imagine any other example of such conversion from a reference to a pointer. So I'm afraid the fork is the way here.

What if we add GSL_NULL_POINTER_RELAX define to switch between strong gsl::not_null and old behaviour ?

@vitlav: I'm not sure what that suggestion would achieve. It seems to fork the library at the source code level, which is rather undesirable.

I'll point out to those who keep identifying the "boilerplate" code generated by this change, that having that code enables people to see where they are injecting restrictions on pointer values in their code and reason about those costs. With an implicit conversion at those sites, the conversions (and resulting checks) are invisible.

C++20 contracts will offer a facility where you can inject requirements like "not-nullness" into a function's contract without requiring source code changes at callsites. They will also let you choose how much enforcement you'd like. That is a more general and flexible facility than anything like not_null could ever approach. If the explicit constructor on not_null is frustrating, perhaps contracts are a better choice for your codebase.

@vitlav @john-preston @kivadiu @twadrianlis @ericLemanissier and everyone who uses not_null.

@neilmacintosh had good suggestions on how to address the adoption of explicit not_null - I would like to gather opinions on how useful those would be to you:

  • using a transition class for incremental adoption of the new not_null version - a class, say, sloppy_not_null that allows implicit transitions to and from not_null. Then developers can migrate incrementally using following steps:

    1. replace all not_null by sloppy_not_null - this should compile
    2. replace some sloppy_not_null occurences to not_null, fix compilation errors and design, test
    3. repeat
  • using tools to automatically update the code (no redesign here, just insert an explicit constructor call where needed, and fix design later in incremental steps)

    1. fixups using clang (need help there)
    2. fixups using msvc (I can be on point to create one)

@annagrin If we are able to write an automatical tool to change the code, maybe the need for sloppy_not_null vanishes?

@kivadiu sloppy_not_null is easy to write and might be useful for various reasons:

  • it gives a way to incrementally update before the tools are written
  • it can be a better option if someone wants to redesign their API - compiler will be the tool to tell you where, after replacing a few sloppy_not_nulls to not_nulls

So I don't see a reason no to do both (sorry for a double negative:)

I am being worn down by this problem for large code bases that's been using not_null for a while before the change to the constructor, as the change proves to be very hard to make in those, and prevents them from moving to a new version. One could argue that the language got it backwards - all pointers should be not_null by default an special optional type should exist for "possibly null" pointers, so we will never win the perfectionist game anyway:)

So I am considering reversing the approach - make the not_null constructor implicit, and for those who want the benefits of better performance and design, create a strict_not_null class (replacing sloppy_not_null) with explicit strict_not_null constructor.

Thoughts?

@annagrin That seems like a reasonable and pragmatic approach to me. It at least provides a path for people to get to a more efficient and correct place (strict_not_null) if they want to do so.

as the change proves to be very hard to make in those, and prevents them from moving to a new Version

I don't really get that point. Don't existing code bases with not_null just need global "replace in files" all occurences of "not_null" by "sloppy_not_null"? That does not look very hard to me.

Don't existing code bases with not_null just need global "replace in files" all occurences of "not_null" by "sloppy_not_null"?

It's more involving that that, the following does not work anymore:

void f(not_null<X*> x) {
}
int main() {
    X x;
    f(&x);
}

You need to change f(&x) by f(make_not_null(&x)).

@beinhaerter my thinking points:

  • Strict not_null and sloppy_not_null is an opt-out (and hopefully transition) strategy that requires quite a lot of changes in existing source code before GSL version can be updated in the code, and a header containing the type to be included everywhere. In large code bases, this needs to be done incrementally and creates a lot of merge conflicts. Meanwhile, the new code cannot use strict not_null and acquires all the problems of the old code. For large code bases that are being actively changed, the cost quickly becomes prohibitive.

  • Strict not_null, which is a goal of the above opt-out strategy, has annoying problems - such as the need to call the constructor on known non-null pointers such as addresses of static variables and locals. This means that either we are missing some language capabilities, or types are not the best approach in this case.

  • A combined effect of the above in practice leads to code bases never switching to the new version, creating forks with removed explicit constructor, or using sloppy_not_null but never changing to the strict type, which only add inconvenience but completely miss the transition as a goal.

  • Just the fact that non_null is so wide-spread in the code give an idea that language got it backwards, another point towards types not being the best approach here.

  • Removing explicit constructor from not_null and strict_not_null provide an opt-in strategy that does not require changing old code but still gives the opportunity for the new code to use stricter types. For the code bases that wish to have their not_null to be strict, it is "as easy" as before to replace not_null by strict_not_null and fix the build breaks and update interfaces incrementally.

  • It is still not clear what is the best approach to describe API that take non-null arguments - contracts look promising but are still converging in the standard. Especially, we need to make sure null checks do not get removed if they cannot be proven unnecessary by a compiler, even in release bits.

In short, since we are providing a sloppy and a strict type either way, we would like at least to minimize user's pain with version changes. And continue looking into contracts as a possibly better alternative.

@kivadiu Your code compiles with GSL version 1 where not_null ctor was not explicit. It does not compile with version 2 where not_null ctor is explicit.
As I said: existing code bases that use GSL version 1 not_null could use gsl_helpers::sloppy_not_null as provided in branch dev/annagrin/sloppy_not_null. I just tested it, it works. This is as expected, because as I understand it sloppy_not_null is just a copy of the GSL version 1 not_null implementation (ctor not explicit).

@annagrin I appreciate that there is a new strict_not_null, and I will switch to that.

I found this very regrettable: opting for the "unsafe"/suboptimal choice by default isn't encouraging to use the best API moving forward.
Of course there is some cost to pay for existing "legacy" codebase, but it isn't clear to me why optimizing for legacy should "win" (especially since there is an easy path forward for these: mass replace not_null to sloppy_not_null) while it is acknowledged that the alternative encourages better APIs.

@joker-eph I agree that it's sub-optimal. But the argument I heard/read was that Microsoft already adopted the non-explicit not_null and the cost of moving them to the explicit version turns out to be prohibitive. @annagrin detailed specific problems with that process earlier in this thread - it is not as easy as just search-and-replace.

In that case, I can see that taking this path allows them to keep up with GSL improvements and gradually adopt strict_not_null. This GSL implementation is Microsoft's effort, so that seems a reasonable position to me.

Yes, it would be nice if we got everything right with the "nice name" first time, but we didn't. We want to take as much code as possible to a better place. Even non-explicit not_null is better than no not_null at all.

@neilmacintosh I read this thread multiple times but I didn't find why " it is not as easy as just search-and-replace"?
I understand that transitioning to the strict version is difficult for large codebase, but that's a different problem than just upgrading to v2 and have the code build.
@annagrin do you mind explaining why a global search/replace does not solve this for these codebase?
It also seems like something that a clang fixup could implement automatically.

(Actually what I could find more compelling as an angle would be the argument that the type system in C++ does not help by not allowing to bind expressions like &a implicitly to not_null but not raw pointers, and so there is a tradeoff here. However PR #743 motivates only for legacy support and mentions "using explicit constructor is very beneficial for new code, since it encourages better API design and make null checks intentional")

@joker-eph I suggest you re-read the first couple of bullet points of @annagrin's comment as they explain why global search-replace didn't work.

@neilmacintosh note that there are 4 comments from @annagrin in this issue that contain bullet lists. I read all off them and still can't figure out, sorry.
So for being very specific and since you didn't point to a particular comment, I will reproduce the first couple of bullet points for all of @annagrin's comments here inline (I don't want you to believe that I was just lazy and didn't read the comments...):

We need a constructor from a reference (would love to see a PR, btw!)

Unrelated?

c++17 is better than c++14 . As Neil mentioned, f(gsl::not_null{p}) and gsl::notnull p{&i} work.

Unrelated?

using a transition class for incremental adoption of the new not_null version - a class, say, sloppy_not_null that allows implicit transitions to and from not_null. Then developers can migrate incrementally using following steps:

  • replace all not_null by sloppy_not_null - this should compile
    -replace some sloppy_not_null occurences to not_null, fix compilation errors and design, test
    repeat

This is related, but it explains that replacing works, not the opposite.

using tools to automatically update the code (no redesign here, just insert an explicit constructor call where needed, and fix design later in incremental steps)

This is related, but it explains how to mass update to the strict version, not how to use the sloppy-one. I.e. it helps to move from sloppy to non-sloppy.

it gives a way to incrementally update before the tools are written

Argument in favor of sloppy.

it can be a better option if someone wants to redesign their API - compiler will be the tool to tell you where, after replacing a few sloppy_not_nulls to not_nulls

Argument in favor of sloppy.

I am being worn down by this problem for large code bases that's been using not_null for a while before the change to the constructor, as the change proves to be very hard to make in those, and prevents them from moving to a new version.

This is not a bullet point, but the first mention of a problem with the update. However it does not elaborate why a mass replacement wouldn't work.

Strict not_null and sloppy_not_null is an opt-out (and hopefully transition) strategy that requires quite a lot of changes in existing source code before GSL version can be updated in the code, and a header containing the type to be included everywhere. In large code bases, this needs to be done incrementally and creates a lot of merge conflicts. Meanwhile, the new code cannot use strict not_null and acquires all the problems of the old code. For large code bases that are being actively changed, the cost quickly becomes prohibitive.

This is saying it is problematic, but does not say a global search/replace wouldn't work (just that it would be annoying with merge conflicts?).

Strict not_null, which is a goal of the above opt-out strategy, has annoying problems - such as the need to call the constructor on known non-null pointers such as addresses of static variables and locals. This means that either we are missing some language capabilities, or types are not the best approach in this case.

This does not address an issue with global search/replace as far as I can tell.

@joker-eph In my opinion the impossibility of implicit constructor of strict_not_null from &a and from this should be the main argument for not_null that allows implicit construction being the default one.

@john-preston I think this is a valid argument, and as I mentioned previously it is a tradeoff that merits discussion. However it does not seem to be the motivation for the current code change.
I personally tend to prefer to pay the price of being annoyed with an explicit ctor when not needed (&a, this) if it makes my code safer overall by catching other unsafe places, but I know many programmers who like implicit in general (had this debate on other kind of construct) so I can see it isn't totally clear cut.

@joker-eph Apologies for missing your questions, will answer them here in bulk:

I found this very regrettable: opting for the "unsafe"/suboptimal choice by default isn't encouraging to use the best API moving forward.

The safety of not_null has not changed by removing explicit constructor, but I agree, it is indeed suboptimal. As I mentioned before, we are optimizing for existing large codebases because they need safety and cannot accept the explicit not_null constructor at once. This will help many users take in new GSL versions without major codebase overhaul. For code that would like to be optimal, there is an easy opt-in via strict_not_null.

@annagrin do you mind explaining why a global search/replace does not solve this for these codebase?

Places where constructor calls need to be inserted are not simply searchable, you would need symbols and types to detect those, for example:

void foo(not_null<int*>);
...
int* p = &i;
foo(p); // needs foo's symbol and type, as well as p's type to detect that constructor needs to be inserted

It also seems like something that a clang fixup could implement automatically.

Absolutely. The problems are that there are none yet and not all code compiles with clang. Changing abruptly and requiring modifications in all code without any tooling available is painful, opting into strict_not_null is much less painful, and also can be tooled at the user's schedule without disrupting intakes of new GSL versions.

(Actually what I could find more compelling as an angle would be the argument that the type system in C++ does not help by not allowing to bind expressions like &a implicitly to not_null but not raw pointers, and so there is a tradeoff here. However PR #743 motivates only for legacy support and mentions "using explicit constructor is very beneficial for new code, since it encourages better API design and make null checks intentional")

The problem with creation of not_null from known non-null expressions, and other annoyances (you cannot reuse the same local for raw pointer and not_null) are very compelling - to me they signify that not_null could be not the right solution for the problem altogether. Therefore I care a bit less of getting it very strict by default. Alternatives range from making pointers non-null by default in the language to contracts, but yet to be found and agreed on. So I would treat not_null as a useful type in the meanwhile, and therefore I would like it to actually be practically useful.

Thank for elaborating @annagrin !

I just thought of another solution to this explicit/non explicit constructor: a macro could be used to say if the constructor of not_null is explicit or not (so we would remove strict_not_null and keep only not_null): we could have the explicit constructor by default and the non explicit constructor if macro GSL_NON_EXPLICIT_NOT_NULL_CTOR is defined.
This would require that a codebase would use only one type of not_null (explicit or non explicit constructor) but this looks good to me:

  • 1st you add not_null to all your function prototypes with GSL_NON_EXPLICIT_NOT_NULL_CTOR defined,
  • 2nd you do not define the macro to see how many call locations are to be changed
  • 3rd if it looks doable you change the calls without the need to change all functions prototype to replace not_null by strict_not_null.

That might be more a Visual Studio problem than a GSL issue, so I just link it here for the records. My latest tests with VS2019 show that the compiler emits a lot of lifetime.1 and lifetime.3 warnings with gsl::strict_not_null which it does not emit when I use gsl::not_null.

https://developercommunity.visualstudio.com/content/problem/530662/c26486-and-c26489-despite-using-gslstrict-not-null.html