haskell/core-libraries-committee

`ConstPtr`'s name is misleading

lemonlambda opened this issue · 29 comments

GHC Issue: https://gitlab.haskell.org/ghc/ghc/-/issues/24300

Why?

ConstPtr's name is misleading and I have had to double check if it's really const T *.

Solution

I believe that ConstPtr should either be renamed or have it's function changed to be T * const. If ConstPtr is renamed it should be most likely be called CConst (see Const). If function is changed a new type should be added most likely be named CConst.

How does this help?

See this. Along with this pointer's with a const qualifier do have importance regarding UB.

I will reiterate the reply I gave on Gitlab:

Indeed that is a bit unfortunate. At the Haskell Cryptography Group we are interested in correct ways to declare FFI bindings to such APIs, especially to libsodium. CApiFFI is stricter when it comes to the const modifier of function declarations.

And I suspect @ldillinger will want to chime in regarding the necessary correctness of FFI bindings.

Naming is hard and might be unfortunate, but I doubt it is worth a breaking change and associated migration. One can add a type synonym type CConst = ConstPtr though, would it be any better?

@Bodigrim If I may be so bold, but perhaps we can play on the fact that its introduction was very, very recent, and that its propagation to the ecosystem has been minimal? Looking at https://hackage-search.serokell.io/?q=ConstPtr it really seems minimal.

(And perhaps the next CLC proposals that are accepted can be released as "experimental" for a release or two, to get the feeling of the ecosystem usage? but that's a parallel discussion)

and that its propagation to the ecosystem has been minimal

I have two problems with this line of thinking in general:

  1. we're still using hackage/stackage as a metric and ignore whatever humongous codebase a company might have come up with (IOGs cardano is open source, btw.)
  2. even if adoption is low, it might suddenly skyrocket, because a given base version exposes said identifiers and it doesn't throw a warning, nor does the documentation of the exact base version indicate that "oh, we made a mistake"

I understand, although humongous codebases usually also have a hard time migrating to newer GHC versions, which is why I have a certain level of confidence about this.
But I recognise it's a bet, in the end.

If we really think something should have a different name then I suggest that we completely reintroduce the thing under the new name. We should keep the old thing around to avoid breaking existing code, but we can leave copious documentation in the Haddock that people should use the new thing (and even deprecate the old thing).

IMO const T * (pointer to a constant) and T * const (constant pointer) are both valid interpretations of ConstPtr a, I don't find it more intuitive for it to mean T * const. Would it work to add a CConst a modifier, so that we can just have CConst (Ptr a) and Ptr (CConst a)? That would make it obvious, what is what.

and that its propagation to the ecosystem has been minimal

Where do you draw the line in terms of stability, then?
I think it's a viable approach is to mark ConstPtr deprecated and refer to CConst.

Yup, if "minimal propagation" is an argument for breaking changes, then it's an even stronger argument for rapid deprecation, without actually breaking anything.

Where do you draw the line in terms of stability, then?

See #223 for a one relevant proposal. Not used by anyone: should be deprecated for at least one GHC major release before removal.

There certainly does not need to be a construct that maps to T * const. In context of FFI and ABIs, you always want T const * and never T * const. In FFI you always work with rvalues and never lvalues, as such a const only makes sense to the left of *.

To me, this unambiguously signals that something called "ConstPtr" stands for a T const *, because who would ever need a T * const on a function signature?

By the same token, my opinion is that there does not need to be a separate CConst qualifier, because its only meaningful uses would be expressible in terms of ConstPtr.

To me, this unambiguously signals that something called "ConstPtr" stands for a T const *, because who would ever need a T * const on a function signature?

Indeed ConstPtr was introduced for CApiFFI purposes, see #118.

IMO const T * (pointer to a constant) and T * const (constant pointer) are both valid interpretations of ConstPtr a, I don't find it more intuitive for it to mean T * const.

+ 1. Admittedly I'm illiterate in C, but at the moment the proposal does not motivate why interpreting ConstPtr as * const is superior to const *.

I know literally nothing in the area of FFI, I'd love the proposal to educate and elucidate me to understand its benefits.

Admittedly I'm illiterate in C,

TL;DR, if you use CApiFFI, you'll know when you want/need to use ConstPtr, otherwise you don't need it.


C syntax doesn't make it easy to remember what is "constant address to mutable contents" or "(mutable) address to immutable contents"

int strlen(const char *buf); // or
int strlen(char const *buf);

is the latter.

int probablywrong(char * const buf);

doesn't really make any sense. const char * const buf (or char const * const buf) kind of (constant address to immutable contents), but it doesn't add any value for the caller.

Where ConstPtr need comes in are nested uses: e.g. https://www.postgresql.org/docs/current/libpq-exec.html#LIBPQ-PQEXECPARAMS

PGresult *PQexecParams(PGconn *conn,
                       const char *command,
                       int nParams,
                       const Oid *paramTypes,
                       const char * const *paramValues,
                       const int *paramLengths,
                       const int *paramFormats,
                       int resultFormat);

const char * const *paramValues, is an address to immutable array of pointers to immutable char arrays.

The problem is that CApiFFI creates wrappers, and while char * can used as argument to const char* taking function:

int wrapper(char *buf) {
	return strlen(buf); // ok
}

in nested array case it's not ok:

int strlen_many(const char * const * strings) {
	int len = 0;
	while (*strings != NULL) { len += strlen(*strings); strings++; }
	return len;
}

int wrapper2(char **strings) {
	return strlen_many(strings);
}
foo.c: In function ‘wrapper2’:
foo.c:32:21: warning: passing argument 1 of ‘strlen_many’ from incompatible pointer type [-Wincompatible-pointer-types]
  return strlen_many(strings);
                     ^~~~~~~
foo.c:25:5: note: expected ‘const char * const*’ but argument is of type ‘char **’
 int strlen_many(const char * const * strings) {

And that is tricky. https://stackoverflow.com/questions/5055655/double-pointer-const-correctness-warnings-in-c is a good starting point.

from the linked FAQ: https://c-faq.com/ansi/constmismatch.html

C++ would still not allow assigning a char ** to a const char **, but it would let you get away with assigning a char ** to a const char * const *.

but C doesn't, so we really need ConstPtr for CApiFFI wrappers.

I'm still reading through this all and forming my thoughts, but I definitely a ran into this exact thing in developing the botan-bindings library, that ConstPtr is used for any const variables and is not just pointers. I did not notice it at first, but now I find the inaccuracy of the ConstPtr name to be a source of confusion, and would readily spend the effort to update from ConstPtr to CConst (or possibly one of the other solutions, as I am still reading).

I'd rather this change occur now, while CApiFFI is still relatively new and under-used - for comparison, we already need to shim ConstPtr in for backwards compatibility with older base anyway.

that ConstPtr is used for any const variables and is not just pointers.

@ldillinger Can you elaborate? Everyone in the thread above seems to agree that in FFI there's no use for const other than to the immediate left of a pointer.

@mniip

Ah! I have been meaning to reply to this for some time - I did a write-up, but lost the thread, and I'm very glad to have found it again!

To sum it up, while the use of const in C is usually associated with pointers, this is not strictly true and it can be used in a variety of situations. This is going to be a bit of C minutia, so bear with me.

NOTE: const applies to either the left or the right, depending on context... we'll ignore that for now

Given a pointer to a variable int * pv, and we might use const to make it into a pointer to a constant const int * pc (or int const * pc which is the same and I'm going to assume that we don't mean 'literally to the right of const'). This results in a pointer to a value, where the value cannot be changed by the pointer. (We can still however point the pointer somewhere else, because the pointer itself is not constant - more on that in a bit).

This is our current use case, where const int * pc ~ ConstPtr Int. Alrighty, nomenclature kinda makes sense (though PtrConst would be more accurate since its a pointer to a constant and not a constant pointer).

NOTE: Something else could still change the value that that pointer points to - const here only means that the pointer can't change it, because const doesn't mean 'immutable', it really means 'read-only within this context'

Alternatively, we might instead define a constant pointer to a variable int * const cpv. Now, the pointer is fixed, but the variable is not - how does ConstPtr represent this ? (Also, now ConstPtr makes sense - it is a constant pointer to a variable)

And what about constant pointers to constants const int * const cpc? There's two const there! Now neither the pointer nor the data can be changed! What does ConstPtr do here? (ConstPtrConst???)

It gets worse. The thing 'to the right' of const may not even be a pointer - it could instead be an array const int arr[3] = {1,2,3}. A lot of C abuses the fact that the address of an array is shared by the pointer to its first element, but they are in fact distinct.

It could also be just a variable / value on the stack const int i = 4 - the value of this variable cannot be changed, but it is not a pointer! (I'm not sure this one matters to the FFI though, but still, consistency)

We could keep going - C++ makes const even more nuanced than in C because C++ also has references, which (like arrays) are distinct from pointers, and so you can have const references too.

Anyway, so const isn't as simple as it looks grimacing.


A somewhat-compatible solution might look something like:

newtype CConst a = MkCConst { unCConst :: a }
type ConstPtr a = CConst (Ptr a)
type PtrConst a = Ptr (CConst a)
type ConstPtrConst a = CConst (Ptr (CConst a))

@ldillinger in your examples cpv, cpc etc are still lvalues. They are variables you declare in the middle of a function. In FFI we never deal with that -- as I said, we only deal with rvalues.

Even if we're talking about global variables, to use them in a mutable fashion, you have to foreign import pointers to them:

-- int global_var;
foreign import ccall "&global_var" globalVar :: Ptr CInt;
-- int const global_immutable_var;
foreign import ccall "&global_immutable_var" globalVar :: ConstPtr CInt;

As for arrays, you actually cannot declare a "const array of T" rather than "array of const T" in C. Even if you try to do it using a typedef, the const will simply commute with the array and apply to the T instead:

typedef int TwoInt[2];
TwoInt const *ptr; // this declares a pointer to array of 2 const int
int const (*foo)[2] = ptr; // interconvertible, no warning
TwoInt const *bar = foo; // interconvertible, no warning

Source: C11 standard section 6.7.3.9.

Haskell's FFI implementation currently targets interoperation with C, and doesn't aim to support the C++ ABI, and C++ references are part of that, not part of the C ABI.

I will point out a caveat that I found whilst researching this: it is possible to mark struct members as const, and they have some funky consequences like making it impossible to write through an otherwise mutable pointer to a struct if it has any const members. However once again this doesn't affect us because we don't have struct definitions as an FFI language feature. You have to roll your own struct member access using Storable and by offsetting pointers.

I stand corrected / clarified then - I'm glad I followed up, and can stop worrying about this.

@LemonjamesD @Kleidukos is there still an interest and motivation to carry on with the proposal?

FWIW if past behaviour is any predictor of future one, I would not expect a proposal to change meaning of ConstPtr silently to pass. We can introduce a new wrapper, in which case I'd expect a proposer to describe clearly its semantics and required support on GHC side for {-# LANGUAGE CApiFFI #-} to recognise it.

@LemonjamesD @Kleidukos is there still an interest and motivation to carry on with the proposal?

FWIW if past behaviour is any predictor of future one, I would not expect a proposal to change meaning of ConstPtr silently to pass. We can introduce a new wrapper, in which case I'd expect a proposer to describe clearly its semantics and required support on GHC side for {-# LANGUAGE CApiFFI #-} to recognise it.

Yes I would like to continue this. I would like to rename ConstPtr to PtrToConst or PtrConst

@LemonjamesD please prepare a GHC MR so that we can discuss a specific proposal.

It's unlikely that a simple renaming will be approved. CLC members are likely to ask for a deprecation period, allowing gradual migration.

@LemonjamesD if there is no progress by the end of March, I'll close as abandoned.

@LemonjamesD please prepare a GHC MR so that we can discuss a specific proposal.

It's unlikely that a simple renaming will be approved. CLC members are likely to ask for a deprecation period, allowing gradual migration.

I would like to do a 3 release deprecation policy.
With this in mind I would like to follow this:

  1. ConstPtr is soft marked as deprecated and PtrToConst is introduced.
  2. ConstPtr produces a warning that it's deprecated.
  3. ConstPtr is removed and now only PtrToConst exists.
    This will provide adequate time to migrate from ConstPtr to PtrToConst.

@LemonjamesD please prepare a GHC MR so that we can discuss a specific proposal.

It's unlikely that a simple renaming will be approved. CLC members are likely to ask for a deprecation period, allowing gradual migration.

I plan on opening the GHC MR within the next few days or so.

@LemonjamesD thanks for moving it forwards. If I may suggest, the proposal would benefit from a clear motivational section. I personally do not feel PtrToConst that much superior to ConstPtr to justify all the churn.

Given that only pointers to const data are relevant for CApiFFI, maybe we can just add PtrToConst as a type (and pattern) synonym of ConstPtr? No breakage, no migration needed, everyone's happy.

Given that only pointers to const data are relevant for CApiFFI, maybe we can just add PtrToConst as a type (and pattern) synonym of ConstPtr? No breakage, no migration needed, everyone's happy.

I would say this would create even more confusion as we would have two things that do the same thing with different names.

@LemonjamesD where are we now with the proposal? The minimal requirement is to provide an MR, but I do feel that the proposal can greatly benefit from an articulated motivation for the change.

@LemonjamesD unless we make a substantial progress, I'll close this in two weeks.

Closing as abandoned. Feel free to reopen once there are resources to make progress.