nim-lang/Nim

feature-request pointer size pair, (openArray as value)

krux02 opened this issue · 13 comments

This idea is analogue to the c++ span type:
open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0122r1.pdf

The origin of this idea is, that I have written several C language wrappers and a repeating pattern to store a variable length array in a struct, is with the pointer/size pair. I use the following struct to solve the problem locally:

type
  UncheckedArray {.unchecked.} [t] = array[0,t]

  DataView*[T] = object
    data: ptr UncheckedArray[T]
    size: int

This is mostly for interfacing with C libraries. For example there is the C library that publishes the following api:

struct MyChildStructA {
	char someData[8];
};

struct MyChildStructB {
	char someData[16];
};

struct MyMotherStruct {
	char headerData[64];
	long numA;
	struct MyChildStructA* childrenA;
	long numB;
	struct MyChildStructB* childrenB;
};

I would provide a wrapper like the following:

type
  UncheckedArray {.unchecked.} [t] = array[0,t]

  DataView*[T] = object
    data: ptr UncheckedArray[T]
    size: int

proc dataView[T](data: ptr T; size: int): DataView[T] =
  result.data = cast[ptr UncheckedArray[T]](data)
  result.size = size

type
  MyChildStructA* = object
    someData*: array[8, char]

  MyChildStructB* = object
    someData*: array[16, char]

  MyMotherStruct* = object
    headerData*: array[64, char]
    numA: clong
    childrenAptr: ptr MyChildStructA
    numB: clong
    childrenBptr: ptr MyChildStructBcontiguous 

proc childrenA*(arg: MyMotherStruct): DataView[MyChildStructA] =
  dataView(arg.childrenAptr, arg.numA)

proc childrenB*(arg: MyMotherStruct): DataView[MyChildStructB] =
  dataView(arg.childrenBptr, arg.numB)

With proper Nim integration, this struct could provide iterators, index operators, conversion to openArray, so that it feels just like a Nim data structure.

The reasone why a seq does not do the job, is because the seq has quite a different memory layout. The seq has two integers, len and reserved, then the data follows directly without another pointer indirection. This mains, that I can't fill the seq header with just the right values, so that the seq sees the exact same data as the C library does (apart from that, I also don't know how well that would work together with garbage collection). And I also can't just copy the data. Copying the data would destroy the idea of the thin wrapper, and sometimes it is necessary to have write access the the members of the DataView. Even though I am not entirely happy with the name, I still like the View part of that name, because it implies, that one has only a view access to data. One is not responsible for freeing that memory, or able to keeping it alive over the owners lifetime. Other possible Names might be DataSlice, DataWindow or just openArray because it is semantically the same, just extended to be also be usable as values, not just function parameters.

Further possibilities:

It could be used for slicing:

proc dataView[T](arg: seq[T]; elements: Slice[int]): DataView[T] =
  result.data = cast[ptr UncheckedArray[T]](arg[elements.a])
  result.size = elements.b + 1 - elements.a

var s = @[7,5,6,4,8,1,2,9,0]

echo sorted(s[2..6], cmp)

var s1 = s.dataView(2..6)

#s1.sort(cmp)  ## fails, because I can't provide a converter to openArray
#echo s
##
## should print @[7,5,1,2,4,6,8,9,0]
##                   |<sorted >|

When DataView either becomes convertable to openArray, or becomes openArray, Then all algorithms that provide an interface to operate on openArray then automatically also become able to operate on any sub array of arrays and seqs.

When openArray would become it's own struct, then it would also fix the problem that the following code does not copile in the C backend.

proc foo(arg: openarray[int]; argLen0: int): void =
  echo arg.low
  echo arg.high
  echo arg.len

Currently it generates the following C code.

N_NIMCALL(void, foo_OqSXkfoGaLLhgMRcwTdoHg)(NI* arg, NI argLen0, NI argLen0);
Araq commented

A struct on its own makes C interop worse, not better, array,len pairs are common in C APIs, they usually don't have this wrapped as a struct.

Yes it's true, in C a pointer, length pair is usually not wrapped in a struct. I think it is mostly for the reason, that such a struct would introduce more overhead to a C programming than it would solve. eg arg[i] would not work anymore, it would be required to be arg.data[i] and such a struct would be required to be defiend for each pointer type. I have seen the following patterns to hide the members of MyMotherStruct, when the children should still be accessable:

// version A

struct MyChildStructA* MotherGetAPtr(struct MyMotherStruct* arg) {
  return &arg->childrenA;
}
struct MyChildStructB* MotherGetBPtr(struct MyMotherStruct* arg) {
  return &arg->childrenB;
}
long MotherGetACount(struct MyMotherStruct* this) {
  return arg->numA;
}
long MotherGetBCount(struct MyMotherStruct* this) {
  return arg->numB;
}

// version B

void MotherGetChildrenA(struct MyMotherStruct* this, struct MyChildStructA** ptr, long* length) {
  *ptr = &this->childrenA;
  *length = &this->numA;
}
void MotherGetChildrenB(struct MyMotherStruct* this, struct MyChildStructB** ptr, long* length) {
  *ptr = &this->childrenB;
  *length = &this->numB;
}

But in Nim pointers don't have pointer arithmetic (and that is a good thing), so this C API would not map into a good Nim API, unless a pointer/length pair is available.

(This is a message to @krux02, since I don't think araq will be interested in a big change to the standard library.)

I don't know about C interop, but for pure Nim a "SeqView" is invaluable. For example, what if you want to sort a sub-sequence? sort[] takes and openarray, but it could take a SeqView. Maybe there could be a "views" module for SeqView, ArrayView, and StringView. I'd argue that most standard library functions that take an openarray or string should offer an overload to take some type of "View". And you cannot simply delegate to the standard library version, so maybe someday, after we have something written, araq would agree to add these and to let the current standard library versions delegate to these.

For memory-management, the View could hold a reference, but then maybe it's not as flexible as openarray. SeqView->ref seq, ArrayView[N]->ref Array, StringView->ref string. For that reason, I'm not sure of the best way to write such a library.

@pb-cdunn I don't think I understand what you are saying, I rather think that my explanation for this feature request was not good enough to easily understand it. That is why I recently added the link the the c++ proposal, because it tries to explain the same thing just with different words.

the View could hold a reference, but then maybe it's not as flexible as openarray

The whole point of SeqView is to have something that is not as limiting as openarray. OpenArray is implemented, when you look at the generated C code, as a pointer and a size. The problem is, OpenArray is not available as a value type, and you can't set the pointer to point to some element of a seq, that is not the beginning, nor could you set the length to any value that is not the length of the seq you construct the openarray from.

Maybe it is better to call the type from now on span. Not that I in particular think that is a great name, but that is how this type will be called in the c++ world, and it does not conflict with anything in the standard library, except htmlgen.span. But that shouldn't be a problem because htmlgen already starts wit the warning "Do yourself a favor and import the module as from htmlgen import nil and then fully qualify the macros".

Reposting here from #7337

In the wild:

  • C++ call that span,
  • Pascal/Oberon/Modula Open Array
  • Rust call that slice Slices are a view into a block of memory represented as a pointer and a length.

this has been implemented as toOpenArray by now. I can't say I like the interface, but the big problem is solved.

@krux02 could we re-open this issue?

I don't see how toOpenArray answers this issue which was: "feature-request pointer size pair, (openArray as value)"

None of these toOpenArray overload take a (pointer, size) pair:

/Users/timothee/git_clone//nim/Nim/lib/system.nim:4229: proc toOpenArray*[T](x: seq[T]; first, last: int): openarray[T] {.
/Users/timothee/git_clone//nim/Nim/lib/system.nim:4231: proc toOpenArray*[T](x: openarray[T]; first, last: int): openarray[T] {.
/Users/timothee/git_clone//nim/Nim/lib/system.nim:4233: proc toOpenArray*[I, T](x: array[I, T]; first, last: I): openarray[T] {.
/Users/timothee/git_clone//nim/Nim/lib/system.nim:4235: proc toOpenArray*(x: string; first, last: int): openarray[char] {.

eg if the pointer comes from, say, a C function, how do I use toOpenArray ?

Also, none of these toOpenArray address what was mentioned in #5437 (comment)

C++ call that span,
Rust call that slice Slices are a view into a block of memory represented as a pointer and a length.

eg, span (https://stackoverflow.com/questions/45723819/what-is-a-span-and-when-should-i-use-one) allows for pointer + length.

Likewise for D's arrays:

int* ptr = c_func_returning_pointer();
int[] my_array = ptr[0..some_length];
// now we can pass around my_array and it works with all standard library etc

I don't think we should reopen this issue. Most use cases are actually solved. But you are right. An unsafe toOpenArray from a pointer and a size has it's use cases and should be there. I think a new issue with a new name should be fine.

Converting ptr+len to openarray:

template ptrLenToOpenArray*[T](p: ptr T, s: int): auto =
    toOpenArray(cast[ptr array[10000000, T]](p)[], 0, s - 1)

# test:
var a = @[1, 2, 3, 4]
var p = addr a[0]
proc foo(a: openarray[int]) =
    echo @a
foo(ptrLenToOpenarray(p, 4))

@yglukhov don't use auto, use untyped. And isn't that array[<very big number>, T] something that should have been replaced entirely with UncheckedArray? But yes that could work. Thanks for this information.

@yglukhov sorry that's not a good workaround, the compiled time size is either too large or too small (too large will increase chance of stack overflow, too small will prevent that to work). As mentioned, most other languages have a clean way of doing that
I'm working on fleshing out full proposal here: #8256

@timotheecour, sorry, but you could have checked the C codegen before writing your assumptions, but let me do this for you. Hint: compile time size is ignored.

foo_I17oqmCE2MjmJ4Niaf1OnA((((NI*) (p_5oR9c2M9cr3Mo0CosBPfoglQ)))+(((NI) 0)), (((NI) 3))-(((NI) 0))+1);