dart-lang/sdk

[ffi] Support passing and returning composites (structs) by value.

mraleph opened this issue · 11 comments

I wanted to writing bindings for libclang (so that I can generate bindings from C/C++ header files automatically), but I discovered that libclang API uses structs passed and returned by value.

typedef struct {
  unsigned int_data[4];
  void *ptr_data;
} CXToken;

typedef struct {
  const void *data;
  unsigned private_flags;
} CXString;

CINDEX_LINKAGE CXString clang_getTokenSpelling(CXTranslationUnit, CXToken);

So I can't really bind to this API. There are some interesting design problems to resolve: e.g. when a structure is returned by value who manages the memory of that struct and how do we prevent people from making a mistake of passing address of that struct to some other method.

/cc @sjindel-google @dcharkes I think this should probably be part of MVP - but I could not find an issue for this.

You could bind to the API btw, with some very simple C wrappers.

Why did we move this back to Flutter MVP?

We'll do the api change in MVP #37229, but not the by-value-passing.

I noticed something strange
I created a header like this

struct Data
{
    int i;
};

void printData(struct Data);
struct Data createData(int);

But If I write bindings like this (using Pointer instead of passing struct by value)

typedef printData_func = ffi.Void Function(ffi.Pointer<Data>);
typedef PrintData = void Function(ffi.Pointer<Data>);

typedef create_struct_func = ffi.Pointer<Data> Function(ffi.Int32);
typedef CreateStruct = ffi.Pointer<Data> Function(int);
// we can even simply use ffi.Pointer instead of ffi.Pointer<Data>

The bindings are working perfectly (tested on ubuntu 19)
We just can't do ptr.ref to access the Struct.

Complete code can be found here

Can someone shed some light into this? (is it safe to do so), and
If there are any potential memory leak issues involved?

The Data struct with a single int field is treated as a single int field in the Linux ABI.

You're storing the value of the int field as the address of the Pointer, most likely ptr.address will give you the value of the int field.

This will not work for structs that are larger than the architecture size (larger than 8 bytes on 64 bit architectures), and this will not work for structs containing floats.

Is there any documentation how this can be working now?

I'm having a Go library that returns a tuple (double, double) like so:

//export provideCurrentPosition
func provideCurrentPosition() (float64, float64) {
…
}

The go build -buildmode c-shared creates this header file for it:

/* Return type for provideCurrentPosition */
struct provideCurrentPosition_return {
        GoFloat64 r0;
        GoFloat64 r1;
};

extern struct provideCurrentPosition_return provideCurrentPosition();

I cannot modify this behavior (in fact I could since it is FOSS…) and I'm not very interested in cluttering the Go library in favor for the C interop of dart.

@marcofeltmann this is currently only available on the master branch (which you'll have to build manually) or on the dev channel (starting from 2.12.0-150.0.dev). Edit: we have released 2.12.0-162.0.dev https://dart.dev/tools/sdk/archive.

Next beta and/or stable releases will include this feature.

Hi, I've been waiting for this feature to wrap some libraries that I've been interested in. It's looking good so far. However, one issue that I'm running into is that the API I'm wrapping returns structs by value, but always takes in those same structs by reference. Is there an easy way to create a pointer and then copy the bytes from a NativeType to the ref of the pointer? Or can I get a pointer to the NativeType itself?

Pointer<T> copyToPointer<T extends NativeType>(T input){
  final p = allocate<T>();
  p.ref = input;
// ^ error here, setter ref is not defined
  return p;
}

I know I can move the fields to the pointer one by one, but this is a huge library, and I don't want to have to do it for every type individually. (I'm using ffigen to generate the bindings, and it is currently over 32000 lines, with more headers that I'll probably eventually need to generate bindings for as well). If this isn't something that can/should be done in the SDK, I can try to submit a pull request to the ffigen repo with some setting to automatically generate functions to create a pointer and copy the fields for each of the structs that are generated.

Edit:
Looking at the utf8 conversion, it would be easier to be able to make a generic function like this if a NativeType could be cast to a Uint8List.

Pointer<T> copyToPointer<T extends NativeType>(T input){
  final p = allocate<T>();
  final pref = p.cast<Int8>().asTypedList(sizeOf<T>());
  pref.setAll(0, input.asInt8List());
  //                   ^ An object of type T has no methods on it (can't cast to anything useful)
  return p;
}

Edit 2:
I might have been trying to use the API wrong, turns out I might not need to do this for all API calls. But I think there is still a chance I might need to do something like this for some of the API.

However, one issue that I'm running into is that the API I'm wrapping returns structs by value, but always takes in those same structs by reference.

@TimWhiting It would be quite curious if the C API would not be consistent in returning and passing structs by value. If that API returns structs by value, but expects them to be passed in by pointer, it puts the burden of resource management on the user of that API. Is the API public? If it turns out to be a common pattern we might want to try to make the use case simpler. If this is a problem, please open a new issue.

Yes the API is public:
Here is the C client library and docs
And the C++ API docs

As an example this c++ file
contains the following

// Creating a pointer
node_options_.reset(new rcl_node_options_t);
// Assigning a return by value result to the underlying memory of the pointer
*node_options_ = rcl_node_get_default_options();

Also this file:

InitOptions::InitOptions(const rcl_init_options_t & init_options)
: init_options_(new rcl_init_options_t)
{
// assigning a return by value to a pointer
  *init_options_ = rcl_get_zero_initialized_init_options();
  rcl_ret_t ret = rcl_init_options_copy(&init_options, init_options_.get());
  if (RCL_RET_OK != ret) {
    rclcpp::exceptions::throw_from_rcl_error(ret, "failed to copy rcl init options");
  }
}

I'm not sure how many structs I would need to do this with, but it's common enough for me to wonder how prevalent.
It's a C library that is designed with the intent that other languages should be using it. I don't know why they made the design decision they did, but I assume they knew that most languages would require a c wrapper that gives the language a pointer to the value rather than the actual value and thus parameters tend to be pointers but not return values? For the most part these are long-lived objects.

I can submit a new issue for this, and I'll try to quantify better how common this pattern is at least in this library.