Pointer/array syntaxes

Question

Pointer/array syntaxes

Opened this issue 7 years ago · 7 comments

jdpage commented 7 years ago

decide on syntax
add pointer syntaxes to the spec
add array indexing syntax to the spec
get array syntax working in parser/simplifier (again; previously in #20, now in #34)

Jonathoughts (tm):

At the machine level, pointers are just numbers. So we could say that we have no pointer type, and allow operations on u16 values directly. I think that this might be too low-level, though we definitely need a way to convert between numbers and pointers to support normal c64 programming.
I'm sort of inclined to suggest that we should keep arrays separate from pointers; I like the idea of having the size of the array be a part of the type. This makes it easy to support looping over the array using a for statement as well.
The C-alike "read type declarations clockwise" or however it goes is a headache and not easy. I will be very happy if we do not do that.
Programmers should be able to know which addressing mode they're using without getting bogged down in details.
Slices are kind of cool.

Hardware considerations

The fastest (from both a cycle and register pressure perspective) addressing modes are Absolute and Absolute Indexed, which require the pointer to be a compile-time constant. See issue #5 for ideas on how to expose this desirable addressing mode to dynamic and mutable variables.
For dynamic and mutable pointers and arrays, the our bread-and-butter will be Indirect Indexed.
For compile-time constant addresses, a dereference will be more efficient than indexing by zero.
For dynamic addresses, there's no indirect addressing mode, so indexing by zero is about as efficient as indexing by a dynamic value.

Note that the Indexed addressing modes are indexed by a u8. Indexing with something bigger is probably a standard library concern.

Syntax ideas

Not in love with all of these. In particular, I don't know if I like @ better than *, though I do like it not being the same character as used for multiplication. I'm exercising it below to see if it grows on me at all. & is probably fine for address-of; using it in the type instead of @/* provides symmetry with [] and means that it can always be read as "address of" and @/* can be read as "target of", i.e. &u8 reads as "address of u8" and &foo reads as "address of foo". I stole this from Rust.

We should probably distinguish between mutable pointers and read-only pointers. So C's const int* becomes &i16 and C's int* becomes &mut i16. I have no desire to steal Rust's borrow checker because that is an undertaking and even Rust is having trouble getting it right. I'm fine with providing just basic support for making it clear what a function mutates and what it doesn't.

I actually really like the idea of providing compiler intrinsics using function syntax, which can later be
promoted to "real" syntax if it proves useful. Rust uses name!(args) syntax for this (and macros), but I'm not really in love with that.

use mem    

let foo: u8 = 7
let bar: &u8 = &foo     --[[ places the address of foo into bar ]]
let baz: u8 = @bar      --[[ dereferences bar, placing the result into baz ]]

--[[ mem.as-address is a compiler intrinsic that allows
--   a pointer to be interpreted as an address. 
--   mem.as-pointer is also a compiler intrinsic which is
--   goofily generic on its output type. I don't like it. ]]
let qux: u16 = mem.as-address(bar)
let quux: &u8 = mem.as-pointer(quz)

--[[ uninitialized arrays seem like a bad thing to allow by default
--   maybe provide another goofy mem intrinsic though? ]]
--[[ language arrays are limited to 256 bytes in length; bigarrays sounds
--   like a nice library feature. ]]
let mut spam: [u8 of 0 to 10] = [ 0 ]      --[[ declares a 10-element zeroed array ]]
let mut eggs: [u8 of 32 to 57] = [ 7 ]     --[[ declares a 25-element array filled with sevens ]]

let wibble: u8 = spam[foo]         --[[ note that array indices are u8 ]]

--[[ another goofy-generic mem intrinsic which evaluates to a compile-time constant ]]
let wobble: &u8 = mem.well-known(0x0400)

Note that pointer types don't have arithmetic defined on them--the mem.as-address and mem.as-pointer functions have to be used to convert them. Pointer arithmetic on the C64 is generally pretty slow compared to direct indexing, so I'm inclined to steer people away from it.

Answer 1 · 2017-12-07T00:08:19.000Z

I'm tempted to suggest handling the weird intrinsics using an unnamed type which contains a 16-bit value and can be assigned to and from any pointer type. This would be useful for a variety of useful intrinsics (for example, a mem.cast-pointer for turning pointers into other pointers.

Answer 2 · 2017-12-10T20:02:44.000Z

summarizing a conversation we had earlier regarding * versus @ for the dereference operator:

you suggested that maybe it was a mistake to name the nodes according to their semantic meaning rather than their text representation. i.e., you suggested that maybe instead of OperatorMultiplyNode, we should have called it AsteriskNode. the reasoning being that if we use * as the dereference operator, the node really shouldn't be called multiply since dereferencing and multiplying are very different things.

i see that instead as a yellow flag that maybe we shouldn't use * for dereference. i actually really like @. you also pointed out that it could be read as "the data at " which i think is also cool.

we're so light on punctuation that it's not like it would be shooting ourselves in the foot by using a distinct operator for dereference, and i think not overloading operators is a good way to make the language more easily learnable.

as for arrays... i don't understand the advantage of:

let eggs: [mut u8 of 32 to 57] = [ 7 ]

over:

let eggs: mut u8[25] = [ 7 ]

what do the starting and end indexes mean?

Answer 3 · 2017-12-10T20:13:40.000Z

i do see value in wrapping the brackets around the type, to make it clear whether your dealing with an array of pointers or a pointer to an array of integers... but i don't see the value of the of x to y thing, so please elaborate on what that is supposed to provide that a simple size wouldn't.

i'm tempted to suggest:

let eggs: [mut u8: 25] = [ 7 ]

except i don't like overloading the colon.

side note: this thing you're suggesting where you can do = [ 7 ] to set all the items in the array to 7 is a thing that C only allows for 0. i'm not saying we should only allow it for 0, because i think that's a bullshit limitation from the user point of view, but we should figure out why they chose to do that. they might have a non-obvious good reason.

Answer 4 · 2017-12-11T05:34:48.000Z

Okay, so let's assume we're gonna go with @ for dereferencing pointers.

As for the range thing (ignoring mut positioning for now), Rust does let eggs: [u8; 25] for array types.

Basically, the point of something like

let eggs: [u8; 32 to 57] = ...

... would be to allow you to declare an array with a custom range (useful for some algorithms?) where the type is laid out optimally for that. E.g. if you were to declare

let spam: [u8; 0 to 3] = [0]    --[[ three bytes ]]
let eggs: [u8; 1 to 4] = [2]    --[[ also three bytes ]]

Then the assembly emitted would be like:

spam:
    .byte 0, 0             ; first two bytes of spam
eggs:
    .byte 0                ; last byte of spam
    .byte 2, 2, 2          ; bytes of eggs

... thereby allowing eggs to be accessed without doing any pointer arithmetic. Basically, it's a memory layout optimization for working with arrays that you don't want to begin at 0 for some reason.

Possibly related, I was entertaining the idea of proposing an Ada/Pascal-style ability to create types like

type foo = 1 to 25

... in which case foo would be restricted to that range.

Answer 5 · 2017-12-11T05:53:56.000Z

okay, i'm on board with specifying non-zero-indexed array ranges.

notes:

i don't love the semicolon. i don't love semicolons in general, but i definitely don't love using it to do something other than end a statement, for similar reasons to not loving unmatched single-quotes in a language.
i also don't love [u8 of 0 to 5], although i find it more palatable than semicolons. i would be on board with [0 to 5 of u8] because it's five pieces of u8 data. or maybe [u8, 0 to 5].
it might be convenient to have a shorthand if you want your array to be zero-indexed. like [5 of u8] or [u8, 5].
what do multi-dimensional arrays look like?

and, in reply to your type proposal: it sounds like that making that a language construct (as opposed to a standard library function) could improve the efficiency of generated code. if that's the case, i'm on board with looking into it after we have an MVP.

Answer 6 · 2017-12-11T06:12:02.000Z

semicolons.

here's a multidimensional array:

[u8; 0 to 5, 0 to 10]

Answer 7 · 2017-12-12T15:44:03.000Z

arrays implemented in PR #20