Spec out customization options for high-level bindings
Opened this issue · 4 comments
The generation of the high-level API is much more open to interpretation (of the C header) than the low-level API. Things to think about here are
- Which Haskell type should we use? Some examples to consider:
- Does
char*
correspond toString
,ByteString
,Text
, something else? - Does
int
correspond toInt
,Natural
, or perhaps evenBool
? - Does
int[]
correspond to a vector, a mutable vector, a list, ...? - Should we represent
as
typedef struct { int16_t ai_i; int16_t ai_q; } acme_iq;
Complex Int16
?
- Does
- Representation: How are they represented C side:
NULL
terminated- separate size argument
- something else?
- Direction: is
int*
- input
- output
- a mutable input?
- Ownership: who is responsible for allocating/deallocating memory? Should we use finalizers?
- Size: is
int*
meant to be a pointer to a singleint
, or to an array? - Sharing: when a C function fills some memory with the contents of a struct, are those values shared somewhere else or not? (In other words, might they change unexpectedly when you call another C function, or possibly even without calling anything at all?)
and there are undoubtedly more.
There are also some Haskell-specific things to think about ( some of these need to be considered for the low-level bindings also):
- Calling convention
- Safety
- Purity
as well as ghc
-specific options, such as
It would also be a good to take a look at exactly what Rust bindgen
offers here, and see what's relevant for us:
- It can mark some types as opaque, "detecting some but not all cases" where this is necessary. It's unclear to me at present when this is relevant for us; cases that comes to mind are unions and incomplete structs, though these are more relevant for the low-level API).
- Replacing some types with a handwritten one. This one is almost certainly relevant; where in the low-level API it's okay to generate types for all C definitions, in the high-levels bindings we probably want to be to reuse existing types.
- Prevent derivation of certain classes (
Copy
andClone
,Debug
,Default
). Probably also relevant. - Marking fields as private.
- Code formatting options (semi relevant for us; see #23)
It also supports marking types as #[must-use]
; we're tracking this as its own issue at Haskell equivalent of Rust's must-use?.
Not only do we need to think about all the choices hs-bindgen
need to make, but also how we can give users the ability to influence (customize) those options. Options we could consider are
- Some kind of DSL (perhaps through configuration files, perhaps through command line options, etc.).
- Annotations in the C headers themselves (though this might not be an option for many users), perhaps as Doxygen comments (#113); Rust bindgen does this in a limited way (https://crates.io/crates/bindgen/0.23.1#annotations).
hs-bindgen
as a library, with customizations in normal Haskell code
The downside of the first two approaches is that we might end up with users having to learn bespoke syntax again (which we consider to be a disadvantage of tools such as c2hs
), making the third option quite appealing. It does however mean users might need to compile their own custom version of the tool, but for power users who need to generate a lot of bindings this might be worth it.
As an example of the kind of high-level binding we might want to generate, consider
void resample(
int32_T *res_m_num_valid_samples,
cint16_T res_m_iq_int[30720000],
int64_T res_m_old_rate,
int64_T res_m_new_rate,
cint16_T res_m_iq_resampled_int[30720000]
);
for which we might want to generate
resample ::
Vector (Complex Int16)
-> Int64
-> Int64
-> IO (Int, Vector (Complex Int64))
Perhaps another source of inspiration for specifying expected types is to look at interface description languages, such as https://learn.microsoft.com/en-us/windows/win32/rpc/the-idl-file .
One thing we should probably think about is that when we implement these customization options for high-level bindings (and indeed also the standard set of defaults, see #32), we probably need to make it possible to the mapping from the low-level binding to the chosen high-level binding depend on the target architecture.