/hs-bindgen

Automatically generate Haskell bindings from C header files

Primary LanguageHaskell

hs-bindgen: automatically create Haskell bindings from C header files

Warning

This project is in early stages of development. Do not use.

Project goals

The most important existing tools for the generation of Haskell bindings from C headers, hsc2hs and c2hs, require a lot of user input (see Alternative generators for a full review): they assist in writing bindings by filling in details about the C code when requested, but the process is still driven by the programmer. The goal of hs-bindgen, inspired by the Rust bindgen tool, is to have the entire process be driven by the C header(s) themselves.

Execution mode

It should be possible to run the tool as a preprocessor, or in Template Haskell mode, offering a convenient workflow

module MyModule

generateBindingsFor "path/to/foo.h"

Cross-compilation

We should support cross compilation, ideally in both execution modes, but definitely in preprocessor mode.

We need to do this reliably, which means that we need to use existing infrastructure (for example, to find out the offsets of all fields inside a struct). We will therefore bind to libclang.

No (or very limited) bespoke syntax

One of the downsides of working with c2hs is that users need to learn a new (frankly rather arcane) syntax. We want to limit any new syntax that users might have to learn, working primarily with just regular Haskell. The low-level / high-level split we propose (see below) is part motivated by this requirement: even if we do not use the tool to generate high-level Haskell bindings, users can write their own, by writing regular Haskell code that happens to work with the (generated) low-level bindings. This should also improve integration with tooling such as HLS.

For the high-level bindings this is more challenging, as users will need to be provided with ways to customize decisions made by the tool. A good option for power-users here might be to offer hs-bindgen as a library, so that customization can be done again with regular Haskell code.

Roadmap

The project is split up into three major milestones, each of which are useful in their own right and can be released as version 0.1, 0.2 and 0.3.

The object here is to be able to generate Haskell types with Storable instances for "all" struct, enum and union definitions found in the C header.

We should support most field types, including bitfields, fixed size arrays, flexible array members, etc.

This will require a mechanism to select which instances are of interest, perhaps similar to those supported by Rust bindgen, or through some kind of "program slicing", starting with a set of functions the user is interested in. This is especially important because headers can #include other headers.

The explicit goal of this milestone and the next one is to generate low-level bindings that mirror the C definitions exactly. So, for example, if a struct contains a field of type char*, the corresponding field in the Haskell type will have type Ptr CChar. Constructing higher-level bindings (where we might use String, for example), will not be considered until Milestone 3: High-level API. As such, it should be possible to generate these bindings with minimal user input or customization, ideally none (apart from selection).

We should also generate a test-suite to check that the Storable instances we generate are correct.

The goal of this milestone is to generate low-level foreign import declarations for all functions declared in the header file. Like in milestone 1, the goal here is to avoid needing user input as much as possible, though some decisions do need to be made (for example, should calls be safe or unsafe?).

Whenever possible, if the C header contains documentation, we should also include that documentation as Haddocks in the generated bindings.

We should support functions that accept or return structs by value, by generating appropriate wrappers for them.

We should also generate binding for constants and global variables.

There should also support some additional C types in this milestone (types which don't involve Storable instances), such as typedefs, and incomplete structs.

While some for users these low-level bindings might be useable as-is, the primary objective here is to make it easier for users to manually write high-level bindings; this is now regular Haskell coding, and should be well supported by tooling such as HLS.

We might want to release this together with milestone 2.5, see below.

This milestone sits in between milestones 2 and 3 because it is useful for both. When hand-writing high-level bindings, there are undoubtedly a lot of patterns that emerge. We should capture these as Haskell functions or type classes and release this as a separate library hs-bindgen-patterns.

Even in the ideal case that all patterns that are used in the construction of the high-level bindings can be expressed using the patterns provided by the hs-bindgen-patterns library from milestone 2.5, it might still be cumbersome to have to write them all out, and so some generation might still be useful.

This is all the more important for data type declarations (as opposed to function definitions); we'll want to try and generate high-level equivalents for structs, enums, and (tagged) unions.

However, there is a trade-off here. There are lot of decisions that need to be made for the high-level bindings: the C header file does not provide sufficient information by itself. This means that the tool must be customizable, for example through a DSL, through annotations in the C header files themselves, or through using hs-bindgen as a library with customizations as regular Haskell code. It is conceivable that in cases that would require extensive customization, perhaps the most direct way to do that customization is not to use generation at all, but simply write bindings manually, provided that the hs-bindgen-patterns library provides sufficient support.

Nonetheless, there will probably be scenarios where a set of defaults and heuristics can do a good job at generating high-level bindings, without much -- or any -- input from the user.

To make tweaking of the output easier, the tool should include comments in the generated code that explain tool decisions. In other words, the generated code should provide sufficient information to the user to allow them to change the way that the code is generated.

This milestone is currently just a collection of additional features that we might consider, such as