well-typed/hs-bindgen

Spec out customization options for high-level bindings

Opened this issue · 4 comments

The generation of the high-level API is much more open to interpretation (of the C header) than the low-level API. Things to think about here are

  • Which Haskell type should we use? Some examples to consider:
    • Does char* correspond to String, ByteString, Text, something else?
    • Does int correspond to Int, Natural, or perhaps even Bool?
    • Does int[] correspond to a vector, a mutable vector, a list, ...?
    • Should we represent
      typedef struct
      {
          int16_t ai_i;
          int16_t ai_q;
      } acme_iq;
      as Complex Int16?
  • Representation: How are they represented C side:
    • NULL terminated
    • separate size argument
    • something else?
  • Direction: is int*
    • input
    • output
    • a mutable input?
  • Ownership: who is responsible for allocating/deallocating memory? Should we use finalizers?
  • Size: is int* meant to be a pointer to a single int, or to an array?
  • Sharing: when a C function fills some memory with the contents of a struct, are those values shared somewhere else or not? (In other words, might they change unexpectedly when you call another C function, or possibly even without calling anything at all?)

and there are undoubtedly more.

There are also some Haskell-specific things to think about ( some of these need to be considered for the low-level bindings also):

  • Calling convention
  • Safety
  • Purity

as well as ghc-specific options, such as

It would also be a good to take a look at exactly what Rust bindgen offers here, and see what's relevant for us:

It also supports marking types as #[must-use]; we're tracking this as its own issue at Haskell equivalent of Rust's must-use?.

Not only do we need to think about all the choices hs-bindgen need to make, but also how we can give users the ability to influence (customize) those options. Options we could consider are

  • Some kind of DSL (perhaps through configuration files, perhaps through command line options, etc.).
  • Annotations in the C headers themselves (though this might not be an option for many users), perhaps as Doxygen comments (#113); Rust bindgen does this in a limited way (https://crates.io/crates/bindgen/0.23.1#annotations).
  • hs-bindgen as a library, with customizations in normal Haskell code

The downside of the first two approaches is that we might end up with users having to learn bespoke syntax again (which we consider to be a disadvantage of tools such as c2hs), making the third option quite appealing. It does however mean users might need to compile their own custom version of the tool, but for power users who need to generate a lot of bindings this might be worth it.

As an example of the kind of high-level binding we might want to generate, consider

void resample(
  int32_T *res_m_num_valid_samples,
  cint16_T res_m_iq_int[30720000],
  int64_T res_m_old_rate,
  int64_T res_m_new_rate,
  cint16_T res_m_iq_resampled_int[30720000]
);

for which we might want to generate

resample ::
     Vector (Complex Int16)
  -> Int64
  -> Int64
  -> IO (Int, Vector (Complex Int64))

Perhaps another source of inspiration for specifying expected types is to look at interface description languages, such as https://learn.microsoft.com/en-us/windows/win32/rpc/the-idl-file .

One thing we should probably think about is that when we implement these customization options for high-level bindings (and indeed also the standard set of defaults, see #32), we probably need to make it possible to the mapping from the low-level binding to the chosen high-level binding depend on the target architecture.

Apparently greenfield does some of this also; tracking taking a look at that separately at #62 .