keean/zenscript

Mutable Objects

Opened this issue · 116 comments

keean commented

Looking at the mess Rust has created with its handling of mutability, and also the confusion l-values and r-values cause in C/C++ I think a simpler model is needed for mutability. I also think that mutability needs to be part of the type signature not a separate annotation.

I think the problem comes from a failure to distinguish between a value and a location in the type system. A value is something like a number 1, 2, 3, or a particular string like "Hello World". It cannot change, values are by definition immutable. References to values do not make sense, as we should simply pass the value (as it is immutable, there is no concept of identity with values a '1' is a '1' there is no concept of 'this one' or 'that one'). Variables stand for unknown values, therefore are also immutable, like for example in "2x = 6". Once we know 'x' has the value '3' it always stands for '3' in that scope. All values and variables are by nature r-values (and not l-values) so this simplifies things considerably. We write the type of an integer value or variable as "Int" for example.

So what about mutability? Well values and variables are not mutable, how do you change something? You need a container (or a location in computer terms). You can only write to something that has a location (is an l-value in 'C/C++' terms. A container is something like an array, that has slots the contain values, and the values in the slots can be changed. As a parametric type we might write "Array[Int]", Of course an array has length, so we want a simple single value container for the common case of something like an integer we can increment. We could write this as a type: "Mut[Int]". This represents a singleton container that can have an 'Int' value put in it. Containers are 'l-values' they have an address. The important point is that the container itself is a value that can be assigned to a variable, it is just the contents that can change.

In this way the semantics are kept clean, and we don't need concepts like l-values and r-values, and nor do we have the problems Rust has with closures, where it does not know whether to reference or copy an object. So if value assignment is 'let' we can assign a value to a variable that cannot change, or another type of object like a single mutable location.

let y = 3 // value assignment, y is immutable, 3 has type Int, 'y' has type Int
let x = Mut(3) // value assignment x is a mutable container that initially has the value 3 in it. x has type Mut[Int]
x = 2 // the contents of the container called 'x' are changed to '2'. This is syntactic sugar for x.assign(2), which would have a type like: (=) : Mut[X] -> X -> Mut[X].

This removes the need to have any kind of special annotation for mutability or references, removes the need to understand l-values and r-values.

Finally we need references to containers so that the same mutable value can be shared with different program fragments. References can be Read Only, Write Only, or ReadWrite. we can have:

let x = RORef(Mut(3)) // x is a readonly reference to a container that has an int of 3 in it.
let y = WORef(Mut(3)) // y is a writeonly reference to a container that initially contains 3.
let z = RWRef(Mut(3)) // a reference that can be used for both reading and writing.

However the variables themselves are still immutable. A for loop could look like this:

for (let x = Mut(10); x > 0; x--) { ... }

where 'x' would have the type "Mut[Int]".

With GC, I don’t see any need to explicitly type references. We simply remember those primitive types which are always copied when assigned.

Agreed on annotating types with read-only, write-only, and read-write. Per my proposal for concurrency, we’ll also need an annotation for exclusive borrowing and another for stored (transitive) borrows.

I am preferring to keep these annotations concise employing symbols:

() for exclusive borrow (enclose any read/write symbols, or , , )
+ for write-only
- for read-only
nothing for read-write
! for stored (transitive) borrow

Note const (non-reassignment) doesn’t apply to the type but rather to the reference instance construction site. I am contemplating that const will be the default (for both function arguments and other instances). For function arguments these will be written normally but for other instances they will be created with := (optionally ) instead of = for reassignment. Reassignable instances will be prepended with var. Obviously for primitive types which are always copied then an instance with writable type must be var. So your example could be written less verbosely:

for (var x = 10; x > 0; x--) { ... }
keean commented

Rust used symbols for different kinds of references and borrows and they have changed to words like 'Ref' etc because it is more readable.

Rust used symbols for different kinds of references and borrows and they have changed to words like 'Ref' etc because it is more readable.

I argued above we do not need Ref because (at least for my plans), contrary to Rust I am not attempting to create an improved C or C++ with programmer control over all those low-level details. Also Rust requires symbols such as & at the call/instantiation site and I proposed no symbols at the call/instantiation site other than := (optionally ). Please enumerate all the symbols Rust used/uses and what they’ve been changed to. Specifics are important. Note I am not yet claiming I disagree, because I need to compare the specifics and think about it.

Afaics, Rust is a blunder thus far, so I am not quite sure if we can cite them as a desirable leader on any design decisions. Remember I had complained to them about the need for the noisy & at the call sites.


Edit: also that last link exemplifies some of the complexity that is involved with having these concepts of references types, value types, and boxed types. I’d rather have the compiler/language deal with those things implicitly. For example, the compiler knows when it needs to use a boxed type because more than one type can be stored there. Unboxed slots in data structures are an optimization. Rust required that complexity because it is tracking borrows in a total order, which IMO is an egregious design blunder.

Ah I see a mention that ~T is the old syntax for what is now Box<T>. I agree that Box<T> is more readable than ~T because ~ has no relationship to Box in my mind. However, I think the symbols I proposed may be more mnemonic:

() cordoned border around access
+ putting more into to something
- taking out something
! for not released, although I realized that Rust named these moves.

keean commented

I have been arguing that it is precisely because type systems do not differentiate between values and containers, that language semantics get very opaque and complex, for example C/C++ have l-values and r-values, and rust does not know whether to copy or reference variables in a closure.

JavaScript does it differently and treats numbers as values and everything else as references. This causes problems because 'enums' should be values but are treated like objects.

If you want a mutable number in JavaScript you end up using a singleton object like {n:3} or an array like [3]. So even in JavaScript there is some kind of notion of a "reference".

I think whatever we do, it needs to cleanly distinguish between l-values (containers) and r-values (values). It probably also needs to distinguish between boxed and unboxed things.

By distinguishing between objects and references you gain control over ownership, and so you can keep track of whether the caller or callee is responsible for deallocating the memory. As you say, this is probably not necessary with GC, but then I think that a hybrid GC would give the best performance, where you deallocate when things leave scope like C/C++, and only garbage collect those objects that escape their scope.

I think Rust's lifetime annotations are taking things a bit far, but I find the RAII style leads to clean code in C++. There is a clear advantage to having destructors over finalisers.

In any case, it is probably safe to infer things at the value level, as long as all these different kinds of things (that have different semantics) are clearly distinguished at the type level.

A prior quote might be apropos regarding “complexity budget”.

I’m attempting to reread most of the discussion on these Issues threads so I can make my final decisions and get on with creating a language or not.

P.S. note I have completed 16 weeks of my 24 week TB medication.

@keean wrote:

I think whatever we do, it needs to cleanly distinguish between l-values (containers) and r-values (values).

I repeat for my needs which is to not be mucking around in low-level details in a high-level language, I think JavaScript got this right in that certain primitive types (e.g. Number, also String correct?) are treated as values and the rest as references to a container of the value. So the programmer never needs to declare whether a type is a reference to a value or a value.

rust does not know whether to copy or reference variables in a closure.

Ditto the default behavior I wrote above is what I think I probably want.

JavaScript does it differently and treats numbers as values and everything else as references. This causes problems because 'enums' should be values but are treated like objects.

What is an enum in JavaScript? Afaik, JavaScript has no enum datatype.

If you want a mutable number in JavaScript you end up using a singleton object like {n:3} or an array like [3]. So even in JavaScript there is some kind of notion of a "reference".

What problem do you see with this?

It probably also needs to distinguish between boxed and unboxed things.

The compiler knows which types are (tagged) sum types and thus must be boxed. Why declare it?

In any case, it is probably safe to infer things at the value level, as long as all these different kinds of things (that have different semantics) are clearly distinguished at the type level.

Does that mean you are agreeing with me?

As you say, this is probably not necessary with GC, but then I think that a hybrid GC would give the best performance

We are sometimes (some aspects) designing different languages. Maximum performance is not the highest priority goal of a high-level language. Performance can be tuned with a profiler (i.e. typically less than 20% of the code needs tuning and perhaps only 5% needs extreme low-level tuning) and can drop down to low-level languages for maximum performance via FFI if necessary.

Marrying low-level details with a high-level language creates design priorities contention. I do not think such a perfect marriage exists. Programmers want a lower complexity budget and only a smaller portion of the code needs that high complexity focus. Python, JavaScript, and Java comprise the most popular programming language set on earth. C/C++ are still as popular as they are because sometimes we must have low-level control and because those other three (well at least Java and especially JavaScript) screwed up the integer types, which is one of things I want to rectify.

Jeff Walker wrote:

As I really learned C++ and began programming in it, I discovered that C++ is a very large and complex language. Why? Well, there are a number of reasons. One is that it follows the zero overhead principle, basically “What you don’t use, you don’t pay for.” That means every language feature has odd limitations and pitfalls to make sure it can be implemented in a very efficient way. Another is that, due to the focus on low level efficiency, there are no safety checks built into the language. So when you make a subtle mistake, which is easy given all the weird edge cases, the program compiles and silently does the wrong thing in a way that maybe succeeds 99% of the time but crashes horribly the remaining 1%. Finally, the language is designed for maximum power and flexibility; so it lets you do anything, even the things you shouldn’t do. This produces a programming minefield where at any moment one might be blown up by some obscure behaviour of the language. Because of that and because other developers and the standard library make use of every language feature, one must learn the whole language. However, C++ is so big and convoluted, learning it is really hard.

Also I agree with Jeff Walker’s analysis of the fundamental reason TypeScript can not radically paradigm-shift to improve upon on the JavaScript minefield (although I disagree with his opinion that adding static typing is regressive):

The real problem with TypeScript is contained in the statement that it is a “superset of JavaScript.”. That means that all legal JavaScript programs are also legal TypeScript programs. TypeScript doesn’t fix anything in JavaScript beyond some things that were fixed in ECMA Script 5.


I find the RAII style leads to clean code in C++. There is a clear advantage to having destructors over finalisers.

You are correct to imply that we must track borrows for safe use of RAII, because if a reference to the local block instance has been stored somewhere, because otherwise RAII can enable use after destruction.

Agreed it is a loss of brevity, convenience, and safety that GC languages such as JavaScript and Java don’t usually support destructors at block-level (or function level) scopes for instances local to that block.

But by minimally tracking borrowing as a compile-time (not runtime!) reference counting scheme as I have proposed, we could support implicit deterministic destructors for block-level instances (and even bypass the GC for these and use compile-time implicit allocation and deallocation, i.e. no runtime reference counting overhead). Good idea! That would be significant feature advantage over other GC languages!

@shelby3 wrote:

I am preferring to keep these annotations concise employing symbols:

() wrapping for exclusive borrow, else enclose any read/write symbols with shorthand or
+ for write-only
- for read-only
nothing for read-write
! for stored (transitive) borrow

Add ? as an abbreviation for | undefined meaning not set (aka Option or Maybe) as opposed to unavailable (i.e. semantically an exception or error) with | null, which can be transpiled in TypeScript to ?: as type separator only for function argument parameters. This a suffix on the type and the others are prefixed except for ! (and in the order listed for consistency). The ! may be combined with the ? as ⁉️.

Edit: may need to add an annotation for unboxed data structures, or may prefer wrapping with Unboxed< … > than a single symbol annotation, because this is not so verbose, these should be used rarely for explicitly needed binary space compression.

keean commented

Personally I don't like the hieroglyphics, it certainly makes it hard for occasional users to understand (like all the symbols in PL1, APL or perl). Most people can transfer knowledge about things like references from one language to another if the notation is readable.

We probably disagree. I do not like verbose text (compounded on top of already verbose textual types, we need some contrast). That is one reason I do not like Ceylon. I like symbols for very commonly used syntax, e.g. the -> for inline function definition. I am surprised that as a mathematician you do not like symbols. What I do not like are symbols for every commonly used function, e.g. Scalaz’s symbol soup. Symbols need to be used in moderation, and we are talking about type annotations (which are already too long and textual), not operators in the executable code.

I do not see any other popular languages with read-only, write-only, read-write, and exclusive borrows annotations, so there is nothing to transfer from. Even Ceylon (and others) adopted ? for nullable types.

Also these annotations on type annotations are mostly to be ignored by the eye, because they are a (type attribute) detail and normally the programmers wants to first focused on the named type annotation which varies between type annotations. Since these little (type attribute) details will be repeating (within a small bounded set of choices) on every type that otherwise are varying (within a unbounded set of choices), they are repetitive almost like noise that should be minimized in terms of visual conspicuity so it does not drown out the more prominent relevant information of the type annotation, e.g. ExclusiveBorrow[ReadOnly[Stored[Nullable[Atype]]]] (even if abbreviated Excl[RO[Stor[Null[Atype]]]]) versus ⊖Atype⁉️ or without Unicode (-)Atype!?.

keean commented

There's a bit of a straw man there, as the type doesn't make sense (you wouldn't have a read-only exclusive borrow, as read only is a reference, and a borrow indicates transfer of ownership, and anything that is referenced must be stored).

The closest type to this would be something like:

Mut[Maybe[TYPE]]

or

type MyType[T]]= Mut[Maybe[T]]
RORef[MyType[TYPE]]

The point is the types are expressing the semantic invariants.

However I see no reason not to allow user defined type operators. So '?' can be an alias for maybe etc.

So we should alow symbols in the type definitions, and it can be left up to the programmer whether to use them or not.

@keean wrote:

There's a bit of a straw man there, as the type doesn't make sense (you wouldn't have a read-only exclusive borrow, as read only is a reference, and a borrow indicates transfer of ownership, and anything that is referenced must be stored).

Incorrect in my intended context. You’re presuming Rust’s borrowing/ownership model. Please read again my proposal and understand how it differs from Rust.

However I see no reason not to allow user defined type operators. So ? can be an alias for Maybe etc.

So we should allow symbols in the type definitions, and it can be left up to the programmer whether to use them or not.

Disagree. Remember you had also mentioned in the past that for readability consistency, one of the basic tenets of good programming language design is not to unnecessarily have more than one way to write the same thing.

keean commented

This: ⊖Atype⁉️ makes it hard to understand the nesting, for example is it a reference to a nullable "type", or a nullable reference to a type? You would not only need to memorise what the symbols mean, but there precedence, to know which ones apply first.

This: ⊖Atype:interrobang: makes it hard to understand the nesting, for example is it a reference to a nullable "type", or a nullable reference to a type?

Lol, the use of the kenkoy ⁉️ emoji.

I am thinking there are no unadulterated (i.e. native, low-level manipulable) null references in the language I think I probably want:

@shelby3 wrote:

With GC, I don’t see any need to explicitly type references. We simply remember those primitive types which are always copied when assigned.

@shelby3 wrote:

I argued above we do not need Ref because (at least for my plans), contrary to Rust I am not attempting to create an improved C or C++ with programmer control over all those low-level details.

@shelby reiterated:

I repeat for my needs which is to not be mucking around in low-level details in a high-level language, I think JavaScript got this right in that certain primitive types (e.g. Number, also String correct?) are treated as values and the rest as references to a container of the value. So the programmer never needs to declare whether a type is a reference to a value or a value.

As you presumably know, in JavaScript a “null reference” is distinguished as the value undefined from null value which has the value null.

I suppose Undefinable is another type for which we may want a type annotation.

Excluding memory allocation which is inapplicable in GC, afaics the only utility of null pointers is to: a) differentiate the state of unset (undefined) from unavailable (null); and/or, b) to differentiate an instance shared (between data structures) from no instance. Sometimes we would for efficiency prefer to use for example the negative values a signed integer (i.e. the MSB aka most-significant-bit flag) to differentiate the null unavailable state from available positive integers state without requiring boxing (and perhaps a similar bit flag hack for unboxed nullables for other data types), thus perhaps we want to be able to differentiate the #a and #b cases. In no case though would a Nullable[Undefinable[…]] make sense in GC, so we do not need two transposed orderings of those annotations.

Since we probably need #b even for types that are not nullable, then the double question mark is an incongruent symbol choice.

I am contemplating how we might type the bit flag hacked unboxed nullables? And generally for any data type? It consumes some of the complexity budget though.

keean commented

Don't forget with GC you often need Weak and Strong references, where a strong reference prevents the referenced object from being garbage collected, whereas with a weak reference the referenced object can be garbage collected, so you must check if it is still there when dereferencing. You can also have levels of weakness determining the memory pressure required to evict the object, to reflect circumstances like "I want to cache this in memory for speed, providing we have spare memory left, but evict as soon as we are running low on memory" vs "I want to hold this in memory for as long as possible, and only evict if we have no other free memory left".

The use of weak references for caches does not work well. Weak references are not needed for breaking cycles if GC is present. The only potentially (but dubious whether) legitimate use for weak reference is for avoiding having to manually undo “put” operations, e.g. removing objects added to a map, list, event/listener list, etc.. I am leaning towards agreeing with David Bruant’s stance that my aforementioned last use case would be encouraging bugs. Also JavaScript has no weak references, so could not support them on a language that compiles to JavaScript unless we implemented our own GC and heap employing ArrayBuffer.

Except for the exclusivity and stored types, the others mentioned must also be allowed on the type parameters of a type. I realized that when contemplating a MutableIterator example.

Afaics, we never place these annotations on type definitions, and only on the types of instances (including function arguments and result types).

@keean wrote:

In this way the semantics are kept clean, and we don't need concepts like l-values and r-values, and nor do we have the problems Rust has with closures, where it does not know whether to reference or copy an object.

With GC and no stack allocation ever, then closures always reference the objects of the closure (because no need to copy them from stack before the activation record is destroyed when the function returns). The mutability is then an orthogonal concern at the typing level, i.e. with GC there are no r-values. R-values are a compiler optimization and implementation concern and I don’t see why they should be conflated with mutability nor the type system.

I do agree that we need to distinguish between modifying the container and the reference to the container, and each should have a separate mutability attribute. Languages which use stack allocation typically prevent modification of the reference to the container, because this could cause memory leaks, which is why the r-value and l-value concept is introduced. My idea for an efficient cordoned nursery can hopefully eliminate the need for that low-level implementation complication leaking into the PL. However, it’s more efficient to store the containers which have immutable references (i.e. only the container may be mutable) directly in the activation record for the function than to store a reference to the container in the activation record. So the separate mutability attribute is important for optimization.

Note closures over non-contiguous heap allocated cactus stacks would have a separate reference to each activation record that contains an object that is also in the closure. So these multiple references (and the cost to access objects via multiple activation record pointers) are an cost that is paid for cactus stacks.

keean commented

You also need to distinguish variable binding from mutation. A value like '3' is immutable, but we can rebind variables like: x = 3; x = x + 1. The value bound to the variable is immutable, but we can rebind. This explains why changing a variable inside a procedure does not change the argument passed.

@keean rebinding is just modifying the reference to the container:

x.ref = new Int(3); x.ref = new Int(x.ref.val + 1)

In a language which uses * for dereferencing:

x = new Int(3); x = new Int(*x + 1)

Instead of the above, JavaScript makes references immutable for primitive objects so they always modify container but since there’s no way to access the reference then rebinding semantically the same as modifying the container (because the reference to 3 is always exclusive):

x = 3; x = x + 1

@keean wrote:

Looking at the mess Rust has created with its handling of mutability, and also the confusion l-values and r-values cause in C/C++ I think a simpler model is needed for mutability. I also think that mutability needs to be part of the type signature not a separate annotation.

I agree that mutability should be on the type, and not as is apparently in Rust some orthogonal concept related to borrowing lifetimes that is only annotates the identifier:

let mut i = 1;

Could you please possibly elaborate on how you think Rust messed up mutability so I may know if I’m missing the pertinent details of your point?

I think the problem comes from a failure to distinguish between a value and a location in the type system […] You need a container (or a location in computer terms). You can only write to something that has a location (is an l-value in 'C/C++' terms. A container is something like an array, that has slots the contain values, and the values in the slots can be changed […]

let y = 3 // value assignment, y is immutable, 3 has type Int, 'y' has type Int
let x = Mut(3) // value assignment x is a mutable container that initially has the value 3 in it. x has type Mut[Int]
x = 2 // the contents of the container called 'x' are changed to '2'. This is syntactic sugar for x.assign(2), which would have a type like: (=) : Mut[X] -> X -> Mut[X].

I’m instead proposing for Zer0 we retain the concept of l-values and r-values and thus only l-values implicitly have a container. With the hindsight of my post herein, do you (and if so then why do you) think conversion of r-values to l-values need to be explicit as you showed with Mut(3) in your example above?

I wrote:

JavaScript does it differently and treats numbers as values and everything else as references. This causes problems because 'enums' should be values but are treated like objects.

If you want a mutable number in JavaScript you end up using a singleton object like {n:3} or an array like [3]. So even in JavaScript there is some kind of notion of a "reference".

I think whatever we do, it needs to cleanly distinguish between l-values (containers) and r-values (values). It probably also needs to distinguish between boxed and unboxed things.

I repeat for my needs which is to not be mucking around in low-level details in a high-level language, I think JavaScript got this right in that certain primitive types (e.g. Number, also String correct?) are treated as values and the rest as references to a container of the value. So the programmer never needs to declare whether a type is a reference to a value or a value.

JavaScript, Java, and Python employ call-by-sharing which is distinguished from call-by-reference because only certain objects are passed-by-reference.

Since the grammar I am currently proposing for Zer0 will have explicit pointer types and dereferencing (*) and explicit pointer construction (&), then Zer0 will be call-by-value because pass-by-reference1 can be achieved with a pointer when needed. Except that Zer0 may automatically simulate call-by-value more efficiently2 by actually employing pass-by-reference behind the scenes when passing a large object which is either immutable or being passed to a type which is read-only (i.e. copying would be expensive and stress the L1 cache). In the immutable case, the code will not know pass-by-reference has been employed, because for an immutable object there’s no difference between pass-by-value and pass-by-reference (except for issues about memory safety, stack frame lifetimes, and garbage collection which I explain below). In the read-only case, the difference is irrelevant (unless it can proven no other writers have access for the life of the read only reference) because it makes no sense to pass to a read-only type by copying the value because the raison d’etre of a read-only type is that other references can mutable the value.

I’ve proposed a near zero-cost memory safety abstraction which I developed from @keean’s suggestion of employing Actors (in a zero-memory-resource-cost manner) for parallelism. So only objects which escape the stack frame lifetime (via compiler analysis which doesn’t require any lifetime annotations nor any of Rust’s aliasing error and tsuris) will be allocated on the non-shared (i.e. thread local) bump pointer heap and rest on the stack (each bump pointer heap is deallocated with one instruction when the Actor returns, so it’s to be very efficient on par with Rust’s performance and 100% memory safety). So the compiler will decide whether to allocate containers on the stack or the bump pointer heap. Only specially annotated reference counted pointers get allocated on the traditional shared heap. (I explained all of this in the above linked post, including how the Actor model will eliminate L3 and L4 cache and do software driven cache coherency and cache-to-cache transfers.)

So therefore the compiler will decide implicitly where thread-local containers are allocated (i.e. stack or non-share bump pointer heap). So containers are not to be an explicit concept. And it will be possible to take the address of (&) of any container (i.e. l-value). This applies even to containers which contain pointers (*). So the following is valid code:

obj := Data()
myptr := &obj
myptrptr :: &myptr    // `::` means ~~not re-assignable aka not rebindable~~[immutable]

@keean wrote:

You also need to distinguish variable binding from mutation. A value like '3' is immutable, but we can rebind variables like: x = 3; x = x + 1. The value bound to the variable is immutable, but we can rebind. This explains why changing a variable inside a procedure does not change the argument passed.

In the grammar I proposed for Zer0, re-assignment (aka re-binding) is controlled with :: or := when constructing and initializing via assignment to a container.

Thus a container that contains a pointer or any non-record type is a special case because the mutability of the contained type is dictated by the re-assignability annotation. So in that case either the mutability annotation on the type must be consistent with the re-assignability annotation, or we can decide to make the mutability annotation implicit on the type as it will implicitly be the explicit re-assignability annotation.

EDIT: Zer0 won’t need rebinding if it’s using call-by-value and not call-by-sharing. JavaScript needs to distinguish between preventing rebinding with const versus mutating the fields of the object, because JavaScript employs call-by-sharing which employs pass-by-reference for some objects. Thus, :: in Zer0 would mean not-writable (i.e. read-only or immutable) for the l-value— i.e. that the implicit container can’t be replaced with a new value. The read-only or immutable attribute would also have to written explicitly on the type annotation if the type is not instead inferred. Without call-by-sharing, the only reason to have this :: is to make the not-writable attribute very clear, which is especially helpful even when the type is inferred and not explicitly annotated. It’s also a way of declaring not-writable when the type is inferred.

I have been arguing that it is precisely because type systems do not differentiate between values and containers, that language semantics get very opaque and complex, for example C/C++ have l-values and r-values, and rust does not know whether to copy or reference variables in a closure.

I found the post you made about that on the Rust forum.

The issue being explained there is that by default in Rust, closures refer to the implicit containers for all the items in the environment of the closure. The containers are always implicit, thus it’s impossible in Rust (and C, C++ etc) to have an identifier represent a r-value.

But you’re proposal isn’t necessary because by definition a r-value is not mutable so immutability of containers would accomplish the same effect, which I already have designed into Zer0.

But we also need some way to tell closures to copy from a mutable container instead of referencing it. Rust has some convoluted way of Copy and/or the move semantics which I don’t entirely grok (and I don’t think anyone should ever need to grok something so complexly clusterfucked (c.f. also)).

The default really should be to reference the stack frame as that is the most efficient (only requires one pointer). Copying is less efficient, so I think it should be done manually. The programmer should make copies of the containers he wants before forming the closure.

Rust’s closures are explicit, which IMO defeats the elegance of their primary local use case. I want only implicit local closures. We already agreed that closures at-a-distance (i.e. across modules) is an anti-pattern.

1 Pass-by-reference is what call-by-reference does for every argument of the function or procedure call. So we use the term pass-by-* when referring to assignment in general or only some of the arguments of a function or procedure call.

2 It’s more efficient to pass the reference than to copy the large object; and because having copies of the object in more than one memory location can cause cache spill. OTOH if there will be many accesses to the object, then it may be more efficient to copy so it can be accessed directly indexed off the stack pointer (SP) which may be more efficient than the double indirection of accessing the pointer on the stack and then the object referenced by the pointer. However if we can keep the pointer in a register, then copying to the stack may provide no speed advantage on accesses. Also (in the context of the near zero-cost resource safety model I proposed because of the Actor innovation) if the object escapes escape analysis and Zer0 must put it on the bump pointer heap anyway, then it can’t be copied to the stack.


I wrote:

Remember I had complained to them about the need for the noisy & at the call sites.


Edit: also that last link exemplifies some of the complexity that is involved with having these concepts of references types, value types, and boxed types. I’d rather have the compiler/language deal with those things implicitly. For example, the compiler knows when it needs to use a boxed type because more than one type can be stored there. Unboxed slots in data structures are an optimization. Rust required that complexity because it is tracking borrows in a total order, which IMO is an egregious design blunder.

Note if we adopt my proposal to forsake open existential quantification, then all dynamic polymorphism will be limited to static union bounds, so it will always be possible to unbox (although maybe wasteful if some of the types in union require much more space than the others). Readers should note this is orthogonal to the issue of needing pointers to avoid recursive types that would otherwise require unbounded space (although Rust seems to conflate these two concepts).

The Zer0 programmer will be able to manually force boxing by employing a pointer. Otherwise I think we should make it unspecified as whether the compiler is employing boxing or unboxing. Ditto (as Go already does) unspecified for order and alignment of record fields (c.f. also and also), we want to leave the flexibility for the compiler to do whatever optimizations it wants:

Optimizers at this point must fight the C memory layout guarantees. C guarantees that structures with the same prefix can be used interchangeably, and it exposes the offset of structure fields into the language. This means that a compiler is not free to reorder fields or insert padding to improve vectorization (for example, transforming a structure of arrays into an array of structures or vice versa). That's not necessarily a problem for a low-level language, where fine-grained control over data structure layout is a feature, but it does make it harder to make C fast.

C also requires padding at the end of a structure because it guarantees no padding in arrays. Padding is a particularly complex part of the C specification and interacts poorly with other parts of the language. For example, you must be able to compare two structs using a type-oblivious comparison (e.g., memcmp), so a copy of a struct must retain its padding. In some experimentation, a noticeable amount of total runtime on some workloads was found to be spent in copying padding (which is often awkwardly sized and aligned).

Consider two of the core optimizations that a C compiler performs: SROA (scalar replacement of aggregates) and loop unswitching. SROA attempts to replace structs (and arrays with fixed lengths) with individual variables. This then allows the compiler to treat accesses as independent and elide operations entirely if it can prove that the results are never visible. This has the side effect of deleting padding in some cases but not others.

keean commented

@shelby3 I think it's a mistake to conflate boxing (storing the type tag with the value) and pointers.

I also think that it's a mistake to conflate l-values and r-values. Containers are the type of the 'hole' and values go in the hole. These types should be distinguished in the type system. So consider a record (Haskell Syntax):

data Record = R {
   age : Int
   count : Mut Int
}

age is an Int, it's a property of the Record and cannot be changed. Count is a 'hole' that an integer can be stored in, and that value can be read or written. Note: we should hide the 'hole' when serialising, so reads from 'Mut Int' would be better if they are transparent, so I think I have changed my mind on that from before, but it should be a distinct type. So in C terms age is an r-values and count can either be an l-value or an r-value. All objects should behave uniformly so a Record is an r-value (you cannot assign to it) but you can mutate it's Count. A Mut Record can be assigned to. This would all be abstracted by the Readable and Writable type-classes for generics. 'Mut' is not a very good name for this but I can't think of a better one at the moment? Maybe 'Slot'?

@keean wrote:

Could you please possibly elaborate on how you think Rust messed up mutability so I may know if I’m missing the pertinent details of your point?

I think it's a mistake to conflate boxing (storing the type tag with the value) and pointers.

Agreed. I wrote:

Readers should note this [boxing issue] is orthogonal to the issue of needing pointers to avoid recursive types that would otherwise require unbounded space (although Rust seems to conflate these two concepts).


I also think that it's a mistake to conflate l-values and r-values. Containers are the type of the 'hole' and values go in the hole. These types should be distinguished in the type system. So consider a record (Haskell Syntax):

data Record = R {
   age : Int
   count : Mut Int
}

age is an Int, it's a property of the Record and cannot be changed. Count is a 'hole' that an integer can be stored in, and that value can be read or written.

Your reasoning here appears to be jumbled and not well thought out. We can simply make age immutable and count mutable. That provides the correct semantics. Your proposal to make age a r-value would mean we can’t take the address of if (&age) and that doesn’t seem to make any sense, because either the entire Record is an l-value or r-value depending on context. The l-value and r-value attributes are heavily dependent on context so it doesn’t make any sense to complicate the type system with them. They shouldn’t be part of unification.

@keean

I'm with @shelby3 with this. I don't like to explicitly box mutables behind types, this is for low level languages. It is of course okay when you do that behind the scenes.
I like languages where references are handled like values.

To the philosophical point of a variable:
The question is if mutability exists at all, you can see mutable variables as mathematical variables which get rebounded over each time unit. So each access of a variable is different because of rebounding over time.

keean commented

@shelby3 @sighoya

Let's thinks about values like "3" or "(3, 4)" or "{a: 3, b: 4}"

It should be obvious that all these are values, and hence are immutable. If we have reassignable variables they can be rebound to different values.

Hopefully we agree so far.

keean commented

@shelby3 @sighoya

So values are straightforward and we know for example that "3 == 3" or "{a:3, b:4} == {a:3, b:4}" are identities for values.

So what about mutability? To have something mutable there has to be the concept of identity. That is something has a unique identity apart from its value. This identity is always the "address" of the data.

The confusion is that immutable values may or may not have an address, we just don't care, but mutable objects must have an address.

So when it comes to types we need a different type for mutable Vs immutable as this is a deep property (and something most languages get wrong). Something that is mutable has an address and it's address can be taken (to get a pointer).

Something that is a value "may" have an address, but that address is not valid as an identifier (two integers of the same value are not different because their address is different, or one has an address an the other does not). So we should not ever be able to take the address of a value.

keean commented

@shelby3 @sighoya

So one way we could deal with this syntactically is to use tuple notation for value records like this:

Values:
   3
   (3, 4)
   (a:3, b:4)

So a record is just a tuple with named components.

We can then use '{}' to make things into objects (IE make them addressable):

Objects:
   {3}
   {3, 4}
   {a:3, b:4}

We can then "objectify" values by wrapping them in {}:

Objects from values:
   {3}
   {(3, 4)}
   {(a:3, b:4)}
   let x = (a:3, b:4) in {x}

The final thing should be obvious, values can only contain values, but objects can contain both objects and values. We can assign new values to the values in objects, but we cannot mutate them:

let x := (a:3, b:4)
x.a := 2 // error
let y := {a:3, b:4}
y.a := 2 // okay
let z := {a:(a:3, b:4), b:4}
z.a.a := 2 // error
z.a := (a:2, b:4) // okay

Values are obviously pass by value, but because they are immutable the compiler can optimise with references where it is better for performance.

We can allow an object to be passed where a value is expected, and it becomes immutable, so from the on (inside that function) it looks and behaves just like a value. This is the dual of allowing the compiler to pass values by reference for performance. When passing an object as a value the compiler is free to make a copy for performance reasons.

We might use "abs" notation to make a value from an object explicitly:

let x = {a:3, b:4}
let y = |x|
x.a = 2 // okay
y.a = 2 // error
print(x.a) // prints "2"
print(y.a) // prints "3"

That deals with mutability and immutability :-)

keean commented

@shelby3 @sighoya

This leaves "read-only", "write-only" and "read-write" references, which are a distinct thing from mutable and immutable (value Vs object).

Obviously as values are referentially transparent, there can be no visible references to values, so that simplifies things, as we only have to deal with reference to objects (and objects might be actors - to be discussed elsewhere).

So object references are explicit, except we can overload '.' to do one dereference for objects, so the object type means it is a visible reference, hence although we call it pass-by-reference, we are really passing the reference by value.

If we assume that references default to mutable, then we just need a way to 'cast' a mutable reference to either a read-only or write-only reference.

@shelby3 suggested '::=' for read-only (or was that for converting objects into values, I am not clear on that, which is why I think these things need to be explicit).

How about:

let x := {a:3, b:4}
let y := readOnly(x)
let z := writeOnly(x)

Where readOnly and writeOnly are simply functions with the types forall A . {A} -> ReadOnly {A} and forall A . {A} -> WriteOnly {A} respectively.

In the type system {...} means the thing is both a reference and an object as the two things are synonymous. ReadOnly {...} and WriteOnly {...} are the read-only and write-only versions.

keean commented

@shelby3 @sighoya

Now something a bit different for consideration. Currently with the above syntax '.' gets overloaded to access both values, and objects (references). For values it always creates r-value, but for references it creates l-values or r-values depending on the role in the expression. We msight consider this ambiguity undesirable and instead define:

x<-a // r-value
x->a // l-value

x->a := x<-a + y.a // x is an object, y is a value

Now we can get rid of subsumption from l-values to r-value, and we can be explicit about what we want.

@sighoya wrote:

I don't like to explicitly box mutables behind types

I think you know this (you were just taking a short cut in your comments) and I just want to make this clear for future readers. Please note that implementation of “boxing” may employ a pointer but boxing is not the same concept as a pointer or a container, as @keean corrected noted, “I think it's a mistake to conflate boxing (storing the type tag with the value) and pointers.” Boxing is a means for dynamic polymorphism because the tag for each possible type is stored along with the value so that runtime reflection on the type is possible. Boxing means only to attach the type tag to the runtime value. It doesn’t mean pointer or container, although those may be necessary for the implementation of a boxed value.

To the philosophical point of a variable:
The question is if mutability exists at all, you can see mutable variables as mathematical variables which get rebounded over each time unit. So each access of a variable is different because of rebounding over time.

I remember also making a related point recently in the Subtyping issues thread #8 about math not modeling mutability. A mutable variable can be model in isolation as rebinding, but AFAICT an equation can’t cleanly represent a state machine with unbounded non-determinism. This is why I have heard that some mathematicians are less enthused about computational proofs of math theorems such as the proof of the Four Color Theorem.


@keean I upvoted one of your posts and downvoted the other. Please do not construe the downvote1 as derogatory. My only intent is to make sure future readers readily know that I disagree with the post without needing wade through my walls of text to know it. Now I will explain my disagreement. If I discover I am incorrect, then I will remove my downvotes.

So when it comes to types we need a different type for mutable Vs immutable as this is a deep property (and something most languages get wrong).

Agreed we must have mutability and immutability as types because they impact soundness of typing. IOW, our compiler mustn’t allow writing to an immutable typed object.

So what about mutability? To have something mutable there has to be the concept of identity. That is something has a unique identity apart from its value. This identity is always the "address" of the data.

I understand we mutate the binding to the identifier or the memory container, not the value itself. That is an important detail, but I argue it is an implementation detail that doesn’t belong in the type system.

The confusion is that immutable values may or may not have an address, we just don't care, but mutable objects must have an address […] Something that is a value "may" have an address, but that address is not valid as an identifier (two integers of the same value are not different because their address is different, or one has an address an the other does not).

The concept of value versus variable which is completely covered by immutability is orthogonal (even if we accomplish mutability by only rebinding— i.e. referential transparency— instead mutating the memory container) to the concept of address location in memory.

We don’t only compare the addresses when comparing if two values are equal. We do compare the two separate pointers to values if we are comparing whether they point to the same value— i.e. the address is a value also. So comparing addresses is a shortcut for quickly eliminating whether a comparison of a value is against itself.

So we should not ever be able to take the address of a value.

Agreed that values are always r-values. Values can also be stored in memory, so the address of their memory container is an l-value. So you’re correct that we can’t take (&) the address of a r-value. But since values can be stored, then unless we complicate our model with explicit containers then when taking the address of the expression (e.g. identifier) which refers to a value, then we are implicitly taking the address of the memory container where the value is stored as a language design implementation detail. So syntactically we are taking the address of the l-value. Instead of explicit containers in the type system as your proposed, I prefer implicit containers and the concept of l-values and r-values because of the following.

The pointer address (aka container) where a value is stored is as a language design implementation detail that has nothing to do with the typing. Because we don’t want bifurcate our type system in a “What Color Is Your Function?”-like non-interoperability based on for example whether a value is stored in a register, on the stack, inside another record which is a container, etc.. There are far too many combinations of scenarios to model. So modeling it in the type system would create complexity. And for what gain? The type system is for insuring soundness (aka type safety) so that we don’t get segfaults. The soundness issues w.r.t. to storage location (e.g. use-after-free w.r.t. for example stack allocation or GC) are handled orthogonally as I explained in my post before my prior one.

We can assign new values to the values in objects, but we cannot mutate them:

let x := (a:3, b:4)
x.a := 2 // error
let y := {a:3, b:4}
y.a := 2 // okay
let z := {a:(a:3, b:4), b:4}
z.a.a := 2 // error
z.a := (a:2, b:4) // okay

You’re employing literals to indicate mutability of the type. I would prefer to just write the mutability on the type and not proliferate more subtle variants of expression syntax2 than necessary (because that confuses new users and create inward bound costs that increase attrition of new adopters of the language). Records should have only one syntax and then their type can modulated on the type annotation.

Is x a r-value? If yes, then what purpose does it serve that treating x as an l-value wouldn’t? Is your reason because you want to prevent its address from being taken? But the soundness of that is an implementation detail that is impacted for example by use-after-free.

This leaves "read-only", "write-only" and "read-write" references, which are a distinct thing from mutable and immutable (value Vs object).

I have also all those distinct attributes on types in my proposed draft of the grammar (have not yet uploaded it but I will soon). But because of my aforementioned preference for implicit containers with l-values and r-values, I’m conceptualizing it different than your taxonomy. Your taxonomy is that mutability is on the container and the other attributes are on the value. Also I find one fault in your taxonomy because writing to a value makes no sense because values by definition can never be modified, so you mean that a value can’t be read but you should name that “no-read” since writing has nothing to do with it because values can never be written to. This exemplifies that explicit containers will create bifurcation of needing no-read on the value and mutability on the explicit container. That is too many degrees-of-freedom because I see no need for it.

In my mental model, immutability means none of any references to the same l-value (which in my model an l-value is conflated with its implicit container) can mutate the value inside the container. R-values of course are always immutable because by definition they can’t be referenced. Read-only type applies only to the specific reference with that read-only type— that reference may not mutate the value inside the container but other references to that same container might have permission to change the value inside the container. Write-only means the value in the implicit container can be written but the value can’t be read.

@shelby3 suggested ::= for read-only (or was that for converting objects into values, I am not clear on that, which is why I think these things need to be explicit).

No, it instead means non-writable, which can be either immutable or read-only as defined above. I had explained it up-thread:

EDIT: Zer0 won’t need rebinding if it’s using call-by-value and not call-by-sharing. JavaScript needs to distinguish between preventing rebinding with const versus mutating the fields of the object, because JavaScript employs call-by-sharing which employs pass-by-reference for some objects. Thus, ::= in Zer0 would mean not-writable (i.e. read-only or immutable) for the l-value— i.e. that the implicit container can’t be replaced with a new value. The read-only or immutable attribute would also have to written explicitly on the type annotation if the type is not instead inferred. Without call-by-sharing, the only reason to have this ::= is to make the not-writable attribute very clear, which is especially helpful even when the type is inferred and not explicitly annotated. It’s also a way of declaring not-writable when the type is inferred.

1 I wish Github offered other icons for expressing opinions on posts. I would like to be able to indicate constructive “disagree” without it implying frowning (i.e. thumbs down) on the post. Also like to express other reactions such as the following, although so many choices might be noisy 🤔, 🤨, 😲, 🤯, 😵, 😠, 🤬 (:rage:), 🤥, 🤫, 🧐, 💩, 🦸‍♂️, 🧞, 💤, 💭, 🕳, 💎, 🍀, 🏆, 🎯, 🎲, 🔔, 💡, 📌, 🗿, , , , ⁉️, 🆒 (:sunglasses:), 🆘, ⭐, :hurtrealbad:, 💪, 💣, ℹ️, ✅.

2 Expression syntax as distinguished from type annotation syntax.

keean commented

@shelby3

We don’t only compare the addresses when comparing if two values are equal. We do compare the two separate pointers to two values if we are comparing whether they point to the same value— i.e. the address is value also. So comparing addresses is a shortcut for quickly eliminating whether a comparison of a value is against itself.

This is not right. For values we do not care about the address at all, two values are equal if they are the same. So 3 == 3 no matter what address they are stored at. It is important that values have referential transparency.

For objects it is different, objects are only equal if their addresses are equal, even if they have the same 'value'. This is because they are mutable.

This is very close to JavaScript typing, so it's not an alien concept. In JavaScript things like Ints and Strings are values and objects are not. So let's look at a comparison with javascript:

3 // value in JS
(3, 4) // JS does not have tuples.
(a:3, b:4) // JS does not have a record value notation.

{3} // JS does not allow an object with a single anonymous property
{3, 4} // JS does not allow position indexed objects
{a:3, b:4} // valid JS

So you can see the important categories like integer values and objects with named properties behave like in JavaScript. All we are doing is filling in some missing types:

           Value Object
Singleton   JS 
    Tuple
   Record         JS

So all I am doing is providing types/syntax for the bits missing in JS. This also provides a simple consistent model that separates the datatype (singleton, indexed, named) from whether the data is a value or an object.

keean commented

@shelby3

I would prefer to just write the mutability on the type and not proliferate more subtle variants of expressions syntax than necessary (because that confuses new users and create inward bound costs that increase attrition of new adopters of the language). Records should have only one syntax and then their type can modulated on the type annotation.

We need to be clear that value semantics Vs object semantics is something distinct from a read-only, write-only or read-write reference.

We might be able to do this using just type annotation, but note that popular programming languages like JavaScript make a value syntax difference between objects and values.

@shelby3 wrote:

Agreed we must have mutability and immutability as types because they impact soundness of typing. IOW, our compiler mustn’t allow writing to an immutable typed object.

Why no keyword/attribute tags like in rust. What is different in storing mutable or immutable integers behind some variable in memory. Types should state something about how abstractions are mapped to internal representations, two different types illustrate two different internal representations even if the abstraction is the same (may for performance reasons).

I don't see why type inference is compromised because of (im)mutable attributes.

A mutable variable can be model in isolation as rebinding, but AFAICT an equation can’t cleanly represent a state machine with unbounded non-determinism.

You mean a deterministic equation, what about equations with random* constituents?
The main problem with mutability is not the mutability but the missing history of rebindings, as you said there is no global observer in the universe to record them.

*Of course, it is questionable how good computers simulate randomness and otherwise if randomness exists at all.
Does non determinism exists or does it follow deterministically some certain irrational number.
We can't decide that even if we know all observations before.

keean commented

@shelby3 @sighoya

Are we agreed that we need both value semantics and object semantics in the language?

Value semantics provide referential transparency and greatly simplify things where they are valid.

Object semantics provide for mutation and identity, which allows for mutation whilst preserving identity.

@keean wrote:

We don’t only compare the addresses when comparing if two values are equal. We do compare the two separate pointers to values if we are comparing whether they point to the same value— i.e. the address is a value also. So comparing addresses is a shortcut for quickly eliminating whether a comparison of a value is against itself.

This is not right. For values we do not care about the address at all, two values are equal if they are the same. So 3 == 3 no matter what address they are stored at. It is important that values have referential transparency.

You’re literally not comprehending what I wrote. I admit the way I wrote it was convoluted and abstruse (because my mind was operating in logic mode and not in communication mode). Let me clarify.

I didn’t claim that we need the addresses to compare values. I stated that an implementation optimization could optionally compare the addresses of l-values to quickly eliminate if the two addresses point to the same value (obviously if the value is stored in the same place it must have the same value). This optional optimization for quickly excluding the exceptional case is from the perspective of expressions dynamically referring to l-values. The optimization isn’t possible when comparing r-values. And the optimization may be slower where the exceptional case is rare.

For objects it is different, objects are only equal if their addresses are equal, even if they have the same 'value'. This is because they are mutable.

This is a higher-level semantic that should be above the layer of the language design. The programmer should decide whether he wants to compare addresses of containers when determining if two containers are considered to be equal. We will have pointers in the language so the programmer can control this semantic independently of our design of the language.

{3} // JS does not allow an object with a single anonymous property
{3, 4} // JS does not allow position indexed objects

This is not correct. JS has Array(1) and Array(2) and these are Objects.

Your objection would be that Array(2) is not a type, but JS does not have static typing. Typescript has tuples.

All we are doing is filling in some missing types:

           Value Object
Singleton   JS    JS
    Tuple         JS
   Record         JS

So all I am doing is providing types/syntax for the bits missing in JS. This also provides a simple consistent model that separates the datatype (singleton, indexed, named) from whether the data is a value or an object.

Let me draw your attention again to what I wrote up-thread yesterday (do note I corrected it since you read it):

JavaScript, Java, and Python employ call-by-sharing which is distinguished from call-by-reference because only certain objects are passed-by-reference.

Since the grammar I am currently proposing for Zer0 will have explicit pointer types and dereferencing (*) and explicit pointer construction (&), then Zer0 will be call-by-value because pass-by-reference1 can be achieved with a pointer when needed. Except that Zer0 may automatically simulate call-by-value more efficiently2 by actually employing pass-by-reference behind the scenes when passing a large object which is either immutable or being passed to a type which is read-only (i.e. copying would be expensive and stress the L1 cache). In the immutable case, the code will not know pass-by-reference has been employed, because for an immutable object there’s no difference between pass-by-value and pass-by-reference (except for issues about memory safety, stack frame lifetimes, and garbage collection which I explain below). In the read-only case, the difference is irrelevant because it makes no sense to pass to a read-only type by copying the value because the raison d’etre of a read-only type is that other references can mutable the value.

Thus I see no valid reason nor need for your complex separation-of-concerns because my proposal restores the consistency that JS doesn’t have:

            Object/Value
Singleton      Zer0
    Tuple      Zer0
   Record      Zer0

JavaScript and Java have the problem that they don’t have pointers. Thus those PLs without pointers have to choose a design which is either call-by-reference or call-by-sharing, because there’s no way for the programmer to express a pass-by-reference exception to call-by-value without pointers.

We might be able to do this using just type annotation, but note that popular programming languages like JavaScript make a value syntax difference between objects and values.

I am proposing that for Zer0, we remove that “value syntax difference” of JavaScript, because we have pointers. Because Zer0 can employ call-by-value instead of call-by-sharing due to the presence of pointers.

Are we agreed that we need both value semantics and object semantics in the language?

Value semantics provide referential transparency and greatly simplify things where they are valid.

Object semantics provide for mutation and identity, which allows for mutation whilst preserving identity.

Could you clarify what you mean in the context of my latest replies, so I can be clear on how your ideas relate to mine once you have factored in my clarifications?

keean commented

@shelby3

How do I represent an immutable record in your system? Note this should be a value and have value semantics.

keean commented

@shelby3 it seems you don't appreciate the difference between immutable and read only.

An immutable value is always immutable no matter how you access it. The immutability has nothing to do with pointers or how you access it. Infact because we want referential transparency you cannot have a pointer to a value (that would destroy referential transparency). This is why 'C' is not referentially transparent because &3 != 3 which is a bad thing. Values should literally have no knowable address like in functional languages.

Objects are mutable and non-referentially transparent. You can have a pointer to an object and because you can have a pointer there are different kinds of access you can give, read-only, write-only and read-write.

It seems to me you only have objects in the system you are thinking of, and that throws away all the benefits of referential transparency in functional languages. My approach is that we want referential transparency and immutability wherever possible. Again note that a read-only pointer to a mutable object is a different thing to an immutable value.

@keean wrote:

Again note that a read-only pointer to a mutable object is a different thing to an immutable value.

That’s a pleonasm. How can a value ever be mutable? The definition of values you provided up-thread is an immutable thing that can’t be addressed (i.e. has no identity).

You can have a pointer to an object and because you can have a pointer there are different kinds of access you can give, read-only, write-only and read-write.

Okay so good you implicitly admit I was correct where I pointed out that you were incorrect to associate those properties with values. You should explicitly admit these things so readers are not left in the dark. That is only way to do conscientious and constructive discussion.

This leaves "read-only", "write-only" and "read-write" references, which are a distinct thing from mutable and immutable (value Vs object).

I have also all those distinct attributes on types in my proposed draft of the grammar (have not yet uploaded it but I will soon). But because of my aforementioned preference for implicit containers with l-values and r-values, I’m conceptualizing it different than your taxonomy. Your taxonomy is that mutability is on the container and the other attributes are on the value. Also I find one fault in your taxonomy because writing to a value makes no sense because values by definition can never be modified, so you mean that a value can’t be read but you should name that “no-read” since writing has nothing to do with it because values can never be written to. This exemplifies that explicit containers will create bifurcation of needing no-read on the value and mutability on the explicit container. That is too many degrees-of-freedom because I see no need for it.

In my mental model, immutability means no references can mutate the implicit container (which is conflated with value for an l-value expression). Read-only type means only that the reference (the type applies to) is prevented from writing to the l-value. Write-only means the implicit container can be written but the value can’t be read.

@keean wrote:

Are we agreed that we need both value semantics and object semantics in the language?

Agreed.

It seems to me you only have objects in the system you are thinking of, and that throws away all the benefits of referential transparency in functional languages.

Why do you think that? I wrote that Zer0 has an immutability annotation.

we want referential transparency and immutability wherever possible.

Agreed.

An immutable value is always immutable no matter how you access it. The immutability has nothing to do with pointers or how you access it. Infact because we want referential transparency you cannot have a pointer to a value (that would destroy referential transparency). This is why 'C' is not referentially transparent because &3 != 3 which is a bad thing. Values should literally have no knowable address like in functional languages.

I think you may be conflating orthogonal concerns but I’m not sure. AFAICT, pointers have nothing to do with immutability and referential transparency. You should not be comparing a pointer type to a non-pointer type any way. That should be an illegal comparison prevented by the type mismatch.

I am going to guess that your mistake is that you’re conflating the fact that pointers to mutable things breaks referential transparency. Or that pointers are often employed in imperative programming. But AFAICT it’s not pointers that actually create any referential opacity. Rather its mutability and side-effects that create referential opacity.

Also just because we have pointers doesn’t mean we have to use them. AFAICT, there is no way for taking the address of an immutable object to somehow make other references to that object referentially opaque. Actually mutating pointers may make code referentially opaque, but AFAICT that is an orthogonal concern that is at the programmer’s discretion. IOW our compiler can analyse code and decide whether it is referentially transparent.

Please try to refute and enlighten me.

Readers please note that @keean is ostensibly getting the concept of value semantics from Elements of Programming by Alexander Stepanov:

First, you get value semantics by default. When declaring function arguments or return values, if you specify only the type name (like int) you get value semantics (you pass and return by value). If you wan to use reference semantics, you must make an extra effort to add a reference or pointer type symbol.

Second, we use value semantics in function declarations, because it closely follows the notation and reasoning from mathematics. In mathematics you operate on values. For instance, you define a function as follows:

f: int × int → int
    f(x, y) = x·x + y·y

This is very similar to:

int f( int x, int y ){
  return x * x + y * y;
}

Notice AFAICT the absence of pointers has nothing to do with achieving value semantics, because value semantics can be achieved with pointers:

int f( const int* x, const int* y ){
  return (*x) * (*x) + (*y) * (*y);
}

Third, we do not run into any memory management issues. No dangling references to nonexistent objects, no expensive and unnecessary free store allocation, no memory leaks, no smart or dumb pointers. The support for value semantics in C++ — passing variables by value — eliminates all those problems.

I already explained up-thread that the near zero-cost resource allocation strategy I proposed for Zer0 with the Actor paradigm will eliminate this issue even when not using value semantics.

Fourth, we avoid any reference aliasing problems. Andrew Koenig has neatly illustrated the problem of reference aliasing in this article. In multi-threaded environment passing by value and ensuring that each thread has its own copy of the value helps avoid any unnecessary data races. Then you do not need to synchronize on such values, and the program runs faster, and is safer because it avoids any deadlocks.

Again the Actor paradigm I have been referring to resolves all the thread synchronization issues. Note that proposal can’t (as Rust can) prove the absence of pointer aliasing for the purposes of compiler optimization. For that optimization we either use an unchecked annotation analogous to C’s restrict and/or of course we use referential transparency which will be an option in Zer0.

Fifth, for referential transparency. This means that you get no surprises where your data is modified behind the scenes without you being able to see that.

Again this all about immutability, which will be an optional type in Zer0.

@keean I really don’t see the problem. Please enlighten me.

@sighoya wrote:

Agreed we must have mutability and immutability as types because they impact soundness of typing. IOW, our compiler mustn’t allow writing to an immutable typed object.

Why no keyword/attribute tags like in rust.

What do you mean? The grammar I am proposing for Zer0 has an optional immutability attribute.

What is different in storing mutable or immutable integers behind some variable in memory.

What does that sentence mean?

Types should state something about how abstractions are mapped to internal representations, two different types illustrate two different internal representations even if the abstraction is the same (may for performance reasons).

What is your point? I must not be grokking how the above relates to anything in this discussion.

I don't see why type inference is compromised because of (im)mutable attributes.

It’s not. Who said it is?

@shelby3 , you wrote:

Agreed we must have mutability and immutability as types because they impact soundness of typing

What do you mean with "mutability, "immutability" as "types". Do they represent a container like Mut[Int], Immutable[Int] or do you provide this over attributes. mutable int i; immutable int i;

Why is soundness compromised if mutability and immutability are no types?

keean commented

@shelby3 it needs to be a type because the information needs to be propogated.

Why would you want the information to be invisible? As an attribute it is information that you want to propogate with the type but you cannot see it. This is bad because it is invisible. All the information you want to propogate should be part of the visible type.

An Int like "3" cannot be mutable as it's a value, so I presume you mean something like an object containing the value "3"?

For example in Javascript:

let x = 3
let y = x
x = 2
console.log(y) // prints 3

let x = {v:3}
let y = x
x.v = 2
console.log(y.v) // prints 2

Personally I think the way JS handles objects and values is more high level than the way C handles them.

I still think you should not be able to take a pointer to a value? Can you give an example of why you would do this?

Note I edited this to hopefully make it more lucid:

We might be able to do this using just type annotation, but note that popular programming languages like JavaScript make a value syntax difference between objects and values.

I am proposing that for Zer0, we remove that “value syntax difference” of JavaScript, because we have pointers. Because Zer0 can employ call-by-value instead of call-by-sharing due to the presence of pointers.

keean commented

@shelby3 I think pointers are more suited to a low level language.

For example this does not make sense:

let y = &3 // error

let x = 3
let y = &x // okay

Somehow, mysteriously our value has turned into an object. This kind of invisible implicit behaviour is what really confuses people and leads to serious bugs that threaten the security of software.

@keean wrote:

An Int like "3" cannot be mutable as it's a value, so I presume you mean something like an object containing the value "3"?

For example in Javascript:

let x = 3
let y = x
x = 2
console.log(y)   // prints 3

let x = {v:3}
let y = x
x.v = 2
console.log(y.v) // prints 2

That’s a constructive direction to take the discussion into an example. So let’s compare that to what I’m proposing for Zer0:

x := 3         : Int       // these type annotations are optional and can be inferred
y :: x         : !Int      // the `!` means immutable. The `<-` prefix for read-only isn’t the type because `x` is copied to `y` not passed-by-reference
x = 2
console.log(y)   // prints 3

data V<A>(v)   : A => V(A) // Equivalent to `data V<A> = V(v) : A => V(A)`. This type annotation isn’t optional. I didn’t choose syntax `V<A>(v : A)` because I want it to be consistent with the fact that annotations are always off to the right side (also because otherwise it would clutter the default arguments case).
x := V(3)      : V(Int)
y :: x         : !V(Int)
x.v = 2
console.log(y.v) // prints 3

The difference is because I’m proposing Zer0 is always call-by-value (in the above case imagine an implicit assignment function that takes the RHS as a copy and assigns it to the LHS).

Let’s compare a different example which explains why there’s no concept of rebinding proposed for Zer0:

const x = 3
const y = x
x = 2           // illegal because of `const`
console.log(y)  // not printed because program is illegal

const x = {v:3}
const y = x
x.v = 2
console.log(y.v) // prints 2

And in Zer0:

x :: 3         : !Int
y :: x
x = 2            // illegal because of immutability
console.log(y)   // not printed because program is illegal

data V<A>(v)   : A => V(A)
x :: V(3)      : !V(Int)
y :: x
x.v = 2          // illegal because of immutability
console.log(y.v) // not printed because program is illegal

Personally I think the way JS handles objects and values is more high level than the way C handles them.

C and C++ are call-by-value. The only way to get pass-by-reference is for the programmer to explicitly employ a pointer. Thus they only need the concept of l-values and r-values. Ditto my proposal for Zer0. JavaScript makes everything confusing because of the inconsistency of call-by-sharing (or call-by-reference in the case of Java) which is required because there’s no pointers for the programmer to use to simulate pass-by-reference when the programmer wants pass the value by-reference.

IOW Java, JavaScript, and I think also Python conflate objects and values because they provide the programmer no means to choose (i.e. control) whether to pass the value by-reference or by-value.

I still think you should not be able to take a pointer to a value? Can you give an example of why you would do this?

I mentioned up-thread that we need to take the address of record fields in some cases.

Why would you want the information to be invisible? As an attribute it is information that you want to propogate with the type but you cannot see it. This is bad because it is invisible. All the information you want to propogate should be part of the visible type.

I explained this more than once up-thread. The soundness of the type system doesn’t need an explicit container for values. Making them explicit will bifurcate the scenarios creating a complex maze of interaction of typing. I can’t think of any reason to want to do that. Can you show any example that will not work in my proposal for Zer0?

I think pointers are more suited to a low level language.

One of our goals has been to support both low-level and high-level programming in the same programming language. Also we’re contemplating transpiling to Go, which has pointers.

We already agreed that we should not have pointer arithmetic because it’s unsound and violate the mathematical model for memory and arrays. Go also doesn’t allow pointer arithmetic.

For example this does not make sense:

let y = &3 // error

let x = 3
let y = &x // okay

Somehow, mysteriously our value has turned into an object.

I don’t see any problem with the above code example. The 3 is an r-value thus can’t take the address of it. The x is an l-value.

This kind of invisible implicit behaviour is what really confuses people and leads to serious bugs that threaten the security of software.

Please show me an example of how it creates a bug.

@sighoya wrote:

What do you mean with "mutability, "immutability" as "types". Do they represent a container like Mut[Int], Immutable[Int] or do you provide this over attributes. mutable int i; immutable int i;

Answered in my post above which was a reply to @keean.

Why is soundness compromised if mutability and immutability are no[t] types?

Because if the compiler doesn’t know that a type is suppose to be immutable, then it would allow writes where writes shouldn’t be allowed. Your question seems to be basically asking, “why do have typing.”

@shelby3 wrote:

Because if the compiler doesn’t know that a type is suppose to be immutable, then it would allow writes where writes shouldn’t be allowed.

It seems you don't like modifiers on types. It would be ok if you state that !Int is a restriction on a type by the modifier "!", but to create a separate type for it?

What for a fun the user has to differentiate between three assignment/bind operators: :=, ::=, =

Think about the fact that assignments are used very often, the longer the assignment operator the more onerous is it's usage.

Your question seems to be basically asking, “why do have typing.”

So? How does Java compile with modifiers?

keean commented

@sighoya I don't think Java is a good example of a type system. I don't think it has proper immutables, read-only is a reference property and can be cast away.

keean commented

@shelby3

I don’t see any problem with the above code example. The 3 is an r-value thus can’t take the address of it. The x is an l-value.

How do we know it's an l-value or an r-value if it's not visible in the type system.

I find your proposed system very messy. Why do we need '!Int' for an immutable Int, when Ints cannot be mutable in the first place.

It should be clear that literals have no address. It should be clear that values are immutable. We should use value semantics wherever possible.

Anything mutable must be an object because it has an address. That's what is the difference between an object and a value.

@keean wrote:

Your question seems to be basically asking, “why do have typing.”

So? How does Java compile with modifiers?

I don't think Java is a good example of a type system. I don't think it has proper immutables, read-only is a reference property and can be cast away.

Readers may want to read the reasons why we hate Java.

I don’t see any problem with the above code example. The 3 is an r-value thus can’t take the address of it. The x is an l-value.

How do we know it's an l-value or an r-value if it's not visible in the type system.

The programmer and the compiler know based on context that is orthogonal to the type system.

I already explained to you several times my reasoning why the type system doesn’t need to know. And you have yet to refute and explain why the type system needs to know about l-values and r-values (or instead needs to know about containers which I prefer to be implicit).

I find your proposed system very messy. Why do we need !Int for an immutable Int, when Ints cannot be mutable in the first place.

You’re trolling because I already explained to you that the container is implicit. So the immutability is on the container that holds the value, not on the value.

Please actually make a new salient argument and not just pretend I didn’t already address the point you’re by now reiterating over and over.

It should be clear that literals have no address.

It’s clear. They’re r-values.

It should be clear that values are immutable.

It’s clear. They’re r-values.

We should use value semantics wherever possible.

We can in my proposal. I explained that in great detail. And you have not shown me an example where we can’t get value semantics with my Zer0 proposal, nor have you shown me how my proposal can cause a bug.

Anything mutable must be an object because it has an address. That's what is the difference between an object and a value.

Immutable things can also have an address as I already said for example taking the address of a field in an immutable value that is stored in an implicit container.

I already explained to you up-thread that pointers don’t break referential transparency, thus do not violate value semantics.

Haskell doesn’t have pointers. We want pointers because we want both low-level and high-level programming. As you know Haskell sometimes produces very slow code. It is elegant but not a very practical programming language.

Pointers give the programmer more control. They don’t eliminate the capability to do value semantics.

keean commented

@shelby3

Immutable things can also have an address as I already said for example taking the address of a field in an immutable value that is stored in an implicit container.

I don't like this. I don't want to use a language that does things implicitly, the lack of control always comes back to bite you, like implicit type conversion in 'C'. My experience tells me this is a bad idea, maybe I can articulate this better.

I already explained to you up-thread that pointers don’t break referential transparency, thus do not violate value semantics.

Pointer are references they completely break referential transparency. RT is easy to understand it literally means references (aka pointers) are transparent, IE you cannot see them, the compiler automatically inserts and removes them as necessary because they don't change anything. For example in a referentially transparent language, if you could see references:

let x = 3
&x == x // true

The equality is true because the reference to 'x' is transparent, which means we just see the value not the reference to the value. As &&&x == &&x == &x == x taking a pointer to a value is a no-op in a referentially transparent language like the 'id' function, hence there is no point in having it.

@keean wrote:

Immutable things can also have an address as I already said for example taking the address of a field in an immutable value that is stored in an implicit container.

I don't like this. I don't want to use a language that does things implicitly, the lack of control always comes back to bite you, like implicit type conversion in 'C'. My experience tells me this is a bad idea, maybe I can articulate this better.

We know that implicit type conversions are bad because types are not the same. You have yet to show me an example in proposed Zer0 code that an implicit conversion between a value and its immutable container are not the same. You need to show me how this causes a problem. I keep asking for an example demonstrating your concern.

Your ideological or philosophical argument isn’t conclusive. We need to actually have proof that your theory of it being bad, is actually true in a real world example.

Because the fact is that without implicit paradigms, we wouldn’t have compilers. We would write in assembly code. There’s never a perfect correspondence between explicit high-level expression and the low-level assembly language produced by a compiler, because that’s the definitional distinction between high-level versus low-level. So you actually have to show how the implicit assumption is problematic.

Pointer are references they completely break referential transparency.

How so? I already refuted that up-thread.

For example in a referentially transparent language, if you could see references:

let x = 3
&x == x // true

I already refuted that example up-thread. Why do you reiterate over and over what I already refuted without making any new argument?

The above comparison would be illegal in Zer0 because a pointer type can’t be compared to a non-pointer type.

You’re apparently thinking that Haskell is the only way a programming language can do referential transparency. C++ can also do referential transparency. And C++ has pointers.

Now please actually make an argument?

EDIT: you might actually be correct. But I need to see an example. I’m also thinking you may be conflating. Devil is in the details.

keean commented

@shelby3 pointers are references, and they are not transparent, it's that simple.

Functions always returning the same result for the same arguments is a consequence of referential transparency, not the definition of it.

I have not found any references to C++ being referentially transparent. Can you post any links you have for this?

keean commented

Refresh.

@keean wrote:

I have not found any references to C++ being referentially transparent. Can you post any links you have for this?

Up-thread I linked to that blog that explains C++ does have value semantics.

pointers are references, and they are not transparent, it's that simple.

Incorrect. Pointers to mutable containers are not transparent because they aren’t replaceable with the value they point to. It's the mutability that breaks referentially transparency, not the pointer. Whereas, pointers to immutable containers are referentially transparent. Yet I’m repeating myself what I already wrote up-thread. Why do I always have to repeat myself so many times when I have discussions with you? You did not quote my up-thread statement and refute it earlier.

keean commented

@shelby3 as I tried to explain value semantics is not referential transparency. Referential transparency implies value semantics, but value semantics does not imply referential transparency.

Pointers are opaque references, they are not transparent references.

So show an example where we need referential transparency and value semantics doesn’t suffice.

keean commented

@shelby3 referential transparency allows the compiler to insert references for efficiency, so for example where you have some large value you don't want to copy, the compiler puts a transparent reference in place and avoids copying.

By having transparent references, code is more generic, it's easier to prove correct, and faster due to giving the compiler more optimisation opportunities.

@keean wrote:

Referential transparency allows the compiler to insert references for efficiency, so for example where you have some large value you don't want to copy, the compiler puts a transparent reference in place and avoids copying.

I already proposed up-thread that Zer0 can do that when the implicit container is immutable. Did you not read that? We don’t need to forsake pointers to get that optimization, because a pointer to an immutable container can’t impact the absence of copying the container for the optimization. A pointer to an immutable container has not impact on the container. The only issues are use-after-free which I addressed in that up-thread post.

This is why I say devil is in the details. I think you’re conflating.

By having transparent references, code is more generic, it's easier to prove correct, and faster due to giving the compiler more optimisation opportunities.

I keep asking you for an example that my proposal for Zer0 can’t do. And you have yet to show a single example.

keean commented

@shelby3

I keep asking you for an example that my proposal for Zer0 can’t do. And you have yet to show a single example.

You already refuted that line of argument yourself when you said you can code anything in 'C'. It's not about whether you can do something or not, it's about how simple is the cognitive model and how elegant the abstraction.

So show me an example of how your way is more elegant than my way. Show me something meaningful.

keean commented

@shelby3 it's elegant because mutability does not depend on the context in the expression. It is a context free grammar.

@keean broad nebulous theoretical concepts err in the devils in the details. Please show an example. I want to discuss a concrete example.

keean commented

@shelby3 I have a good reason for preferring to mark mutable objects rather than immutable values, and that is because we want things to be immutable unless specifically made mutable. Immutable should be the default. So thats why "Int" should be the immutable Int value, and "!Int" or I prefer "{Int}" should be the mutable one.

Can we agree that immutable by default makes sense?

@keean wrote:

I have a good reason for preferring to mark mutable objects rather than immutable values, and that is because we want things to be immutable unless specifically made mutable. Immutable should be the default. So thats why "Int" should be the immutable Int value, and !Int or I prefer {Int} should be the mutable one.

Can we agree that immutable by default makes sense?

Actually I think I like that also. I certainly didn’t like the ! annotation for immutable. And I like how the {Int} looks like a container for mutable (which I suppose is what you were intending). In the current proposed grammar for Zer0, I allow parentheses in type annotation for grouping, so the scope of unary annotations can be disambiguated. Yet so far had not found a use for the curly braces (neither in type annotations nor in the code). The {Int} is less verbose than !(Int) when grouping is needed.

Kudos how you turned the discussion towards a quick consensus.

If the default programming paradigm should be encouraging immutability, should I transpose the meanings of := and ::= also or should we retain the immutability meaning for ::=?

How about instead :: for immutable and retain := for mutable?

EDIT: I already changed the syntax as indicated above to {Int} for mutable on type annotations and :: on binding to identifiers for immutable and read-only. I changed the examples on this thread and in the Syntax summary issues #11. The function definition syntax has less of a “symbol soup” appearance with the :: instead of ::= .

EDIT#2: For the Jigsaw PL I’m creating, I reverted back to (my new variant of !Int as) #Int for immutable and am instead using {Int} for exclusive read/write and {-Int} for exclusive read only, where ‘exclusive’ ownership means the only reference to that object. It turns out that with my non-sharing ALP design that immutability may not used as often as read/write and read-only. And I needed a syntiax to express the exclusive ownership.

Also I will use only := for initialization of new identifiers always for consistency, even when prefixed with an explicit type (to differentiate from = for assignment when not so prefixed). And thus not try to encode any of the type in assignment operator for initialization of new identifiers, because access permissions tags (aka reference capabilities) are too multifarious (i.e. read/write, read-only, immutable and optional exclusivity of all of those three) and they even were not multifarious would still not represent all of the type.[Incorrect logic because may only want to add the const or read-only restrictions when assigning a reference or non-reference type respectively. Thus it’s a shorthand when the type can be inferred with the added restriction. And for assigning an immutable it provides an explicit indication of non-writable. Also prefixing instead of suffixing the type would muddy the syntax grammar because the names of constructor functions are uppercase resembling types — would require prefixing a def or fn, backtracking or a right-to-left grammar.]

@keean wrote:

Now something a bit different for consideration. Currently with the above syntax '.' gets overloaded to access both values, and objects (references). For values it always creates r-value, but for references it creates l-values or r-values depending on the role in the expression. We msight consider this ambiguity undesirable and instead define:

x<-a // r-value
x->a // l-value

x->a := x<-a + y.a // x is an object, y is a value
Now we can get rid of subsumption from l-values to r-value, and we can be explicit about what we want.

How is this “ambiguity” ever a problem?

For example, if have a literal with dot access &{a : 0, b : { c : "Joe" } }.b.c and have tried to take the address (i.e. &) of it as shown, I will receive a compiler error.

Imagine that literal is returned from a function so &f().b.c is taking the address of an r-value and the compiler generates an error message.

It’s not really ambiguity. The programmer must know that functions always return r-values in Zer0. I remember that C++ has some optimizations (c.f. also and also) whereby a function can effectively return an l-value by writing the result to the caller’s container (and other optimizations with r-value temporaries and avoiding unnecessary copying), but I forget the details of that.

keean commented

@shelby3 Good, I like {Int} for mutable.

So how about tuples, they should be immutable values as they are a functional programming idea. I wanted to use (Int, Float) for the type and (3, 4.0) for the value so it has the similarity to function notation. The alternative is to have Int * Float as the type, but I prefer making it look like the value.

Then we end up with {(Int, Float)} for a mutable tuple, which I want to allow {Int, Float} as a short hand for, because it doesn't clash with any other type notation and prevents bracket overload. Note: you can mutate this so:

let x := (3, 4.0)
x[1] := 2 // error

let y := {3, 4.0}
y[1] := 2 // okay

So with this notation there is not need for different assignment operators. In fact I am arguing that immutability is rightly a property of the thing you are trying to assign from or to and not a property of the assignment operator at all.

Returning an l-value is no problem with this notation either (note we haven't discussed mutable Vs immutable array types, so I have just gone with something consistent):

fun f(x : [{Int}], y : Int) : {Int} 
   return x[0]

let z := [1, 2, 3]
f(z, 0) := 4
print(z) // prints [4, 2, 3]
keean commented

@shelby3 The Problem With Pointers

We want immutability to be a deep property, that is if we have an immutable data structure like a tree we want to be able to refer to the whole structure as immutable.

Pointers break this because we can take a pointer to something mutable, cast the pointer into a read-only pointer and insert it into an "immutable" data structure. Now our invariants are broken, and when the tree is cached on another node, and something modifies the mutable data, the remote copy will not get updated.

So this shows that the data itself needs to be tagged (in the type system) as mutable or immutable, and we need to ensure somehow that immutable data only contains references to other immutable data. Mutable data can contain references to either mutable or immutable data.

Further to this, JSON is a very useful way of writing data declaratively (and for text serialisation), so we should have a way of writing both mutable and immutable data as declarative syntax, like JSON. This means we must be able to write nested data structures without explicit pointers, and we must be able to write the type signatures for whole data structures separately from the data.

Mu x . (x, x) | Int // immutable binary tree
((1, 2), (3, 4))

Mu x . (x, x) | {Int} // mutable leaves
(({1}, {2}), ({3}, {4}))

Mu x . {x, x} | {Int} // mutable binary tree
{{1, 2}, {3, 4}}

@keean wrote:

Good, I like {Int} for mutable.

So how about tuples, they should be immutable values as they are a functional programming idea. I wanted to use (Int, Float) for the type and (3, 4.0) for the value so it has the similarity to function notation.

Yeah that is what I already had in my draft of the syntax which I haven’t uploaded yet.

The alternative is to have Int * Float as the type, but I prefer making it look like the value.

Agreed and I had the same thought actually.

Then we end up with {(Int, Float)} for a mutable tuple, which I want to allow {Int, Float} as a short hand for, because it doesn't clash with any other type notation and prevents bracket overload.

Good idea. I added it to the syntax and credited you with the idea with a link back to your post.

Note: you can mutate this so:

let x := (3, 4.0)
x[1] := 2 // error

let y := {3, 4.0}
y[1] := 2 // okay

Do we need the curly brackets for mutable literal tuples? I think not. Those are r-values so they can’t be mutable anyway (can’t take the address of a literal expression).

I think the following would be better and is what I currently have in the proposed Zer0 grammar:

x :: (3, 4.0)
x[1] = 2 // error

y := (3, 4.0)
y[1] = 2 // okay

Here it is again with the types annotated:

x :: (3, 4.0)     : (Int, Float)
x[1] = 2 // error

y := (3, 4.0)     : {Int, Float}
y[1] = 2 // okay

BTW, the let are redundant because of the :: and := (distinguished from assignment =) so aren’t in the proposed Zer0 grammar I’m working on.

So with this notation there is not need for different assignment operators. In fact I am arguing that immutability is rightly a property of the thing you are trying to assign from or to and not a property of the assignment operator at all.

You misunderstood the proposed Zer0 grammar. Assignment is never :=. Another reason for :: and := is they eliminate the verbosity of let and const (or var and val). As stated up-thread, they also provide for type inference when the only attribute the programmer wants to indicate explicitly is the mutability. Note though that :: is bifurcated, as it can be either immutable or read-only depending on the type inference (or explicit type annotation).

Returning an l-value is no problem with this notation either (note we haven't discussed mutable Vs immutable array types, so I have just gone with something consistent):

fun f(x : [{Int}], y : Int) : {Int} 
   return x[y]

let z := [1, 2, 3]
f(z, 0) := 4
print(z) // prints [4, 2, 3]

I wrote yesterday that functions in Zer0 always return r-values, because Zer0 is call-by-value (i.e. value semantics by default). To get the semantics you want in your example, you need to return a pointer and then dereference the pointer:

f :: (x, y) => x[y]      : (*{[Int]}, Int) => *{Int}   // this is a pure function; although a closure on the first input argument would create an impure function

z := [1, 2, 3]
*f(&z, 0) = 4
print(z) // prints [4, 2, 3]

Note I wrote the type differently {[Int]} instead of [{Int}]. The array access operator […] will return a mutable element type (e.g. {Int}) if the input array type is mutable. The array access operator […] will return &{Int} when the input type is *{[Int]}. IOW, pass-by-reference must be explicit.

Note this variant prints a different result, because remember Zer0 is to be call-by-value:

f :: (x, y) => (&x)[y]   : ({[Int]}, Int) => *{Int}

z :: [1, 2, 3]           : [Int] // not {[Int]}
*f(z, 0) = 4
print(z) // prints [1, 2, 3]

The above is allowed even though the returned value escapes the stack frame, because Zer0 will detect that with escape analysis and automatically move it to the bump pointer allocated heap (which inside of Actors will be near zero-cost resource deallocation abstraction as I have explained elsewhere).

@keean wrote:

Pointers break this because we can take a pointer to something mutable, cast the pointer into a read-only pointer and insert it into an "immutable" data structure.

No that is an illegal operation. A read-only reference to a container is a presumption that the container can be mutated by another reference that has write permission.

keean commented

@shelby3

No that is an illegal operation. A read-only reference to a container is a presumption that the container can be mutated by another reference that has write permission.

Right, but it's better to exclude the possibility with the grammar than have an error.

{Int} implies a pointer, so &{Int} is redundant, you can only have a mutable integer if it has an address.

{[Int]} would be a pointer to an immutable array, if you want an array of mutable Ints, IE you can replace the Ints with different ones, you want [{Int}] this is consistent with the use of {Int} for a mutable Int.

Having said that I do think there is a simple elegance to pointers, although they are a bit low level. I think anything you have a pointer too should be mutable, and that values should be referentially transparent.

I think it's vital to distinguish mutable/immutable from read-only/write-only/read-write. I think that immutable should have value semantics. I think we need a JSON like literal notation. I can compromise on the rest.

@keean wrote:

Right, but it's better to exclude the possibility with the grammar than have an error.

Agreed but not every compiler error can be excluded by the grammar (nor even every bug by the compiler).

{Int} implies a pointer, so &{Int} *{Int} is redundant, you can only have a mutable integer if it has an address.

Your use of & instead of * in the type indicates to me that you were thinking about C++ references, which aren’t the same as C and C++ pointers. As you know, references are aliases for objects. Pointers point to memory and in the case of my proposal for Zero that is pointing to an implicit container. I’m reasonably certain (as if the case for Go) that Zer0 won’t have pointer arithmetic.

‘Address’ is not the same thing as ‘pointer’ in Zer0, as I showed in an up-thread example:

f :: (x, y) => (&x)[y]   : ({[Int]}, Int) => *{Int}

z :: [1, 2, 3]           : [Int] // not {[Int]}
*f(z, 0) = 4
print(z) // prints [1, 2, 3]

The {[Int]} is not a pointer type, but it does have an address &x. The example above shows why we need to distinguish between pointers and addresses. The immutable value z :: [1, 2, 3] is copied to f(z, 0) because Zer0 is call-by-value. Even if we changed that to:

f :: (x, y) => (&x)[y]   : ({[Int]}, Int) => *{Int}

z := [1, 2, 3]           : {[Int]}
*f(z, 0) = 4
print(z) // prints [1, 2, 3]

In example above, the mutable value z := [1, 2, 3] is copied to f(z, 0) because Zer0 is call-by-value. So if we presumed that {[Int]} is equivalent to *{[Int]}, then we would be copying the pointer instead of the value, which would change the semantics of the program (i.e. prints [4, 2, 3]). IOW, the compiler wouldn’t know when to copy the value or when to copy the pointer (i.e. it’s unspecified and ambiguous), unless we separate those two concerns and make it explicit in code. Also we still need pointers to immutable containers as well.

You could propose instead that we make copying the pointer default and make pointers implicit, then the programmer needs to annotate when the value of the mutable is to be copied instead. But then we still need pointers to immutable containers as well, so you create two conflicting bifurcations of concepts. IMO, better to just keep the concept of pointers distinct from the concept of containers. In Zer0, l-values always have an implicit container, regardless if they’re mutable or immutable. That implicit container can be referenced with a pointer. But only the mutable implicit container can be mutated. (R-values don’t have a container so they can only be copied to an l-value container.)

Having said that I do think there is a simple elegance to pointers, although they are a bit low level.

I agree they’re low-level, but as you see with Java, JavaScript, Haskell, there’s also (performance) limitations and confusions that result from not having pointers (e.g. JavaScript as call-by-sharing which employs pass-by-value for some primitive types and pass-by-reference for others; whereas Java always uses call-by-reference which is inefficient except the clusterfuck complex Hotspot tries to optimize). There’s no perfect choice.

I thought we had decided to make a language like Go but with typeclasses (and now also proposing Actors)? So that we can do reasonably performant low-level code along with high-level?

I think anything you have a pointer too should be mutable, and that values should be referentially transparent.

I think it's vital to distinguish mutable/immutable from read-only/write-only/read-write. I think that immutable should have value semantics.

Again up-thread your argument for that is basically that more compiler optimizations are enabled.

I think we should give the programmer the choice. When the programmer wants to enable those optimizations, the code can forsake the use of pointers. But sometimes pointers are the way to write the most performant code. It will vary.

I’m contemplating that referential transparency is significantly aligned with Haskell’s non-strict evaluation, HOFs, and declarative style of programming, so that refactoring doesn’t impact performance. Well that’s the conceptual ideal or goal, but Haskell can’t optimize everything and pathological cases are probably still common unless you really tweak your Haskell code.

If we disallow pointers to immutable l-values then we end up with a “What Color Is Your Function?” bifurcation of the program where parts of our program are incapable of interacting with each other. That seems unacceptable.

I think we need a JSON like literal notation.

What’s wrong with data type names? I thought we agreed in the Macros and DSLs issues thread #31 that the language itself could be the transmission format. This is like the third time you have raised that same point and I have to repeat the same response every time. You were the one who actually pointed out that we could just use the language syntax.

Why reinvent JSON for a lightweight syntax? We can just use JSON if we need that.

keean commented

@shelby3 I agree about "what colour is your function". Maybe if is better to allow pointers to immutable objects, however, you can only allow read-only pointer to immutable objects, whereas there are three kinds of pointer we can have to mutable objects.

We also need to incorporate value semantics, so we clearly have two things:

  • immutable values
  • mutable objects

The questions is do we want:

  • mutable values // no this does not make sense
  • immutable objects

So immutable objects are a possibility, but we might as well not allow these and limit ourselves to immutable values and mutable objects.

Then we need to deal with read-only, write-only and read-write pointers.

@keean wrote:

Then we need to deal with read-only, write-only and read-write pointers[implicit containers].

I will reiterate again (and I hope the last time I need to repeat it) that at least in what I’ve proposed for Zer0, pointers aren’t the same as implicit containers. To imply otherwise is misleading for readers (such as if I ask a programmer to study this threads before he works on helping to implement the compiler or write documentation for Zer0). Pointers are just another value that also live in an implicit containers when they’re l-values. Only containers have addresses.

The immutable, read-only, write-only and read-write attributes all apply to the implicit container for l-values. They never apply to a pointer, because a pointer is a value (which can be either an l-value or an r-value). R-values are always immutable and have no implicit container, because they’re always copied.

Maybe if is better to allow pointers to immutable objects, however, you can only allow read-only pointer to immutable objects, whereas there are three kinds of pointer we can have to mutable objects.

AFAICT, that doesn’t make sense in the context of what I have proposed for Zer0, because you have in mind a model that puts attributes on pointers instead on the implicit container.

If the implicit container of an l-value is immutable, then any pointer to it must be either a pointer to an immutable or read-only implicit container. IOW, that (immutable or read-only) attribute isn’t on the pointer, but on what the pointer points to.

I do agree that a pointer to an immutable can be subsumed to a pointer to a read-only; and not to a pointer to a write-only or read-write. I had already described the difference between immutable and read-only in a prior recent post. The difference between mutable and either of write-only and read-write is that mutable isn’t specific enough because it doesn’t distinguish between the two. Thus {Int} is really read-write not mutable.

We also need to incorporate value semantics, so we clearly have two things:

  • immutable values
  • mutable objects

The questions is do we want:

  • mutable values // no this does not make sense
  • immutable objects

So immutable objects are a possibility, but we might as well not allow these and limit ourselves to immutable values and mutable objects.

AFAICT your distinction between values and objects doesn’t apply to what I have proposed for Zer0. Zer0 has only values and only copy-by-value (i.e. value) semantics. Values live in an implicit container when they’re l-values.

I think by ‘object’ you mean where equality is tested only by address? Certainly you don’t mean objects from OOP which bind state and methods together, because Zer0 will have typeclasses instead of class methods. I think basically you mean anything pointed to is an object? So that is why you seem to want to conflate implicit containers with pointers. But that’s not the model I am proposing. In my model, an object is a high-level semantic created by the programmer who employs a pointer. It has nothing to do with the implicit container of l-values.

I wrote:

{Int} implies a pointer, so &{Int} *{Int} is redundant, you can only have a mutable integer if it has an address.

Your use of & instead of * in the type indicates to me that you were thinking about C++ references, which aren’t the same as C and C++ pointers. As you know, references are aliases for objects. Pointers point to memory and in the case of my proposal for Zero that is pointing to an implicit container. I’m reasonably certain (as if the case for Go) that Zer0 won’t have pointer arithmetic.

‘Address’ is not the same thing as ‘pointer’ in Zer0, as I showed in an up-thread example:

AFAICT, my proposal doesn’t conflate the automatic taking of the address of that complex C++ aliasing maze. Take a peek into this complex C++ tarpit (and try not to fall in!):

Have you ever bound an rvalue to a const reference and then taken its address? Yes, you have! This is what happens when you write a copy assignment operator, Foo& operator=(const Foo& other), with a self-assignment check, if (this != &other) { copy stuff; } return *this;, and you copy assign from a temporary, like Foo make_foo(); Foo f; f = make_foo();.

The other is an l-value because it’s a method argument. But the l-value contains the reference, not the r-value the reference points to. Taking the address of a temporary r-value is more efficient than assigning the r-value to an l-value, then taking the address of the l-value. However, if the r-value is already a reference, then no address is taken and it is just copied to other.

So for my proposal for Zer0 to not allow taking the address of an r-value, it means the programmer has to organize the code so that the argument passed to other is already a pointer.

This also ties into the optimization of r-value temporaries example:

string s0(“my mother told me that”);
string s1(“cute”);
string s2(“fluffy”);
string s3(“kittens”);
string s4(“are an essential part of a healthy diet”);

And that you concatenate them like this:

string dest = s0 + ” ” + s1 + ” ” + s2 + ” ” + s3 + ” ” + s4;

My proposal for near zero-cost GC with the Actor paradigm (nearly as efficient as Rust without the lifetimes woe) means we can make the + operator function take pointers to strings and they are efficiently allocated and deallocated on the bump pointer heap. So there’s no problem with the extra temporaries which C++ is attempting to solve with that complex move semantics. And Zer0 will be faster than C++ is this case because the allocation is only bumping a pointer and the deallocation has 0 cost (no mark and sweep just reset the bump pointer with the Actor function exits at the top-level of the stack).

@keean we have to apply holistic design. We factor in the all the design elements of Zer0 simultaneously when making any design decision on any one facet (i.e. not in isolation from the other facets).

keean commented

@shelby3

I think by ‘object’ you mean where equality is tested only by address?

An object is a contiguous region of memory, so it has a start address and a length (or extent). An object is defined by itself not it's properties. For example if I have a yellow square, and I change its colour to orange it is the same square. In a single memory address computer the memory address and extent is sufficient to create a unique identity that does not change when it's properties change, however on non-shared-memory machines, or in databases you may need a GUID for the object. Object equality tests if two things are the same object, which is easily defined as if I change a property on one, does the other change also. This is different from value equality where we compare each property for equality (recursively) and to values are equal if their properties are all equal. Of course values are also immutable, so if they are equal now they will always be equal like "3 == 3", the same should be true for objects, because if two objects are identical they should always be so.

The immutable, read-only, write-only and read-write attributes all apply to the implicit container for l-values. They never apply to a pointer, because a pointer is a value (which can be either an l-value or an r-value). R-values are always immutable and have no implicit container, because they’re always copied.

How do you handle the following cases:

  1. a deeply immutable data structure (say an index tree) that is shared between multiple CPU cores?

  2. a local mutable array that can have values stored is changed?

  3. what happens if you try and send a pointer to immutable data to another process on a different CPU core?

  4. how do I send a read only reference to pass to a function, so that I know this foreign function written by someone else cannot modify my mutible data.

  5. how do I send a write only pointer to mutable data, so that I know the function I pass it to can only write to the object, to avoid data-races.

I will now provide some answers for my proposal, illustrating the design:

(1) referential transparency means for immutable data structures you simply pass the data as a single value, the language is forced to copy, because references are not allowed to immutable data. This also implies deep copying because the internal reference are all transparent.

(2) A mutable array has slots for each value, that are not necessarily pointers, but can be internal storage for values/objects of known size, and external for values/objects of unknown size. This allows us to simply say [Int] and [String].

(3) not possible, you can't have a reference to an immutable due to referential transparency. There is never any need to reference immutable data.

(4) All references can be read-only, and we can only reference mutable data. As we only ever refer to mutable data by reference and object is already a reference (because it has identity) so we just need to cast it to read-only and pass.

(5) same as (4) simply cast to write only and pass. There is no casting back, so security is enforced within the language. Between processes we can use the page table to enforce read/write properties.

@keean wrote:

  1. a deeply immutable data structure (say an index tree) that is shared between multiple CPU cores?

    (1) referential transparency means for immutable data structures you simply pass the data as a single value, the language is forced to copy, because references are not allowed to immutable data. This also implies deep copying because the internal reference are all transparent.

Sharing in our proposed Actor paradigm (see Parallelism issues thread #41) is strongly fenced, so the library or runtime will know to deep copy across non-share memory barriers.

Besides I don’t understand how your Haskell-like language is going to always optimize perfectly the use of internal references for efficiency. Haskell’s performance is (as like Java’s Hotspot) a crapshoot. AFAICT, you want to put too much burden on a magic, omniscient super-optimizing compiler that can’t exist. Compilers can’t yet read the minds of programmers. As you stated these sort of optimizations are in the NP computational class.

  1. a local mutable array that can have values stored is changed?

    (2) A mutable array has slots for each value, that are not necessarily pointers, but [slots] can be internal storage for values/objects of known size, and external [slots] for values/objects of unknown size. This allows us to simply say [Int] and [String] [in addition to and instead of only {[Int]} and {[String]}].

I don’t know why you are implying my proposal can’t do this. The type {[Int]} means mutable slots that have internal storage for mutable implicit containers of Int values. The type {[*Int]} means mutable slots that have internal storage for mutable implicit containers of pointers to external immutable implicit containers of Int values.The type {[*{Int}]} means mutable slots that have internal storage for mutable implicit containers of pointers to external mutable implicit containers of Int values. The type [Int] means immutable slots that have internal storage for immutable implicit containers of Int values. The type [{Int}] is illegal because we don’t those data types with multiple type parameters to each have different immutability attributes w.r.t. to each other.

What’s missing?

  1. what happens if you try and send a pointer to immutable data to another process on a different CPU core?

    (3) not possible, you can't have a reference to an immutable due to referential transparency. There is never any need to reference immutable data.

Ditto. Same answer as #‍1 above.

  1. how do I send a read only reference to pass to a function, so that I know this foreign function written by someone else cannot modify my mutible data.

    (4) All references can be read-only, and we can only reference mutable data. As we only ever refer to mutable data by reference and object is already a reference (because it has identity) so we just need to cast it to read-only and pass.

So you eliminate write-only references? You seem to have forgotten that you also want write-only references.

My proposal is equivalent. Any pointer to non-write-only (including immutable) implicit container can be passed to a function that accepts a pointer to a read-only implicit container.

  1. how do I send a write-only pointer to mutable data, so that I know the function I pass it to can only write to the object, to avoid data-races.

    (5) same as (4) simply cast to write-only and pass. There is no casting back, so security is enforced within the language.

You can’t cast a read-only to a write-only in your idea either. My proposal has no deficiencies same as for (4).

Between processes we can use the page table to enforce read/write properties.

This requires more discussion as I might prefer this security to be handled at a managed higher-level. We already had that argument about managed versus hardware security in the Modularity issues thread #39, so let’s not repeat it again now. Anyway, I think my proposal can also use the page table if we do security that way.

keean commented

@shelby3

Besides I don’t understand how your Haskell-like language is going to always optimize perfectly the use of internal references for efficiency. Haskell’s performance is (as like Java’s Hotspot) a crapshoot.

The unpredictable performance of Haskell is caused mainly by lazy evaluation and the garbage collector, not reference handling. Ada provides the kind of references I am proposing, and is as fast, if not faster than C++ (My Ada Go Monte-Calro engine is slightly faster than the C++ one). So the technology to provide the kind of pointerless programming with the speed of 'C' already exists in Ada.

I don’t know why you are implying my proposal can’t do this. The type {[Int]} means mutable slots that have internal storage for mutable implicit containers of Int values.

I was implying that because you cannot use internal storage for an array of strings (because strings do not have a fixed length), with your proposal you cannot simply have an array of strings [String] like [Int], instead you will need to have an array of pointers to strings [*String].

The type [{Int}] is illegal because we don’t those data types with multiple type parameters to each have different immutability attributes w.r.t. to each other.

The point was that an array is just an ordinary type constrictor, we don't really know from the type alone that it is an array. As such we should look for consistency with other type-constructors. Consider the general form: T<A> we can have {T<A>} which like {Int} would represent an array that can be replaced as a whole just like we can replace the Int. On the other hand T<{A}> would represent a type where the contents can be replaced. The point is to have a consistent mental model that can easily be internalised by the programmer.

Ditto. Same answer as #‍1 above.

The problem is that the pointers could point to mutable data. How do you ensure at compile time that any data sent to another process is deeply immutable? In my system you just have to look at datatype and if it is a value, you know it cannot contain any mutable data at any depth. The check is quick and easy. It literally prevents the programmer from creating structures that would be a problem. I think it is better to prevent something from even existing, than it is to check for it and throw an error.

Put it this way, it is impossible to take a reference to a value like '3', it has no address, it is a value. This is how all functional languages treat values.

Objects however have to have an address, and an address is all that is needed to identify them uniquely, so we must have a reference to every object (we use its address to identity it).

So you eliminate write-only references? You seem to have forgotten that you also want write-only references.

I think you misread what I was asking. The point is we can consider this a capability based system. A read-only pointer represents permission to read something, and a write-only pointer represents permission to write something. We give permission by passing a reference of the appropriate kind to some other code, or over a channel to a different process. Consider the following:

type MyObj = {a:{Int}}
instance Writable<MyObj>
   sink(x:MyObj) = x.a // type of x.a is {Int} so we are returning a writable reference

fun<A> f(x: A) requires Writable<A>
   sink(x) = 42

let x = MyObj {a: 0} // default reference is read_write
let y = writeOnly(x)
let z = readOnly(x)
f(x) // okay
f(y) // okay
f(z) // error

A read-only or write-only reference is a permission to read or write, whereas the Readable or Writable typeclass is a requirement to have that permission.

This means the owner of an object can give permission to access it, and users of an object can require permissions to access it. The program is only sound if you have the required permissions.

You can’t cast a read-only to a write-only in your idea either. My proposal has no deficiencies same as for (4).

I was not claiming your proposal had any deficiencies, I wanted to see how it addressed each of these issues, so that we could do a side-by-side comparison.

The point about my proposal is it encompasses all these different requirements, and synthesises them into a coherent whole without lots of little bits. Any system must be simple and easy to explain. If we cannot communicate these ideas to experienced programmers, what hope is there of beginners learning it. Here is an attempt to explain the proposal clearly and concisely:

In the language we have both values and objects. Values are immutable, they are finite, they can always be written down using a structured text notation like JSON for example: (name: "Fred", age: 37) they all are comparable by equality, and passed by value. Objects are mutable, they may be infinite (contain self-references), they cannot always be written down as text, they are comparable by identity, and passed by reference. We have three kinds of reference read-write, read-only, and write-only. We can only cast away permissions, so read-write can be cast to read-only or write-only. Receivers of references are constrained to the permissions passed to them.

@keean wrote:

Besides I don’t understand how your Haskell-like language is going to always optimize perfectly the use of internal references for efficiency. Haskell’s performance is (as like Java’s Hotspot) a crapshoot.

The unpredictable performance of Haskell is caused mainly by lazy evaluation and the garbage collector, not reference handling.

That GC optimization issue involves the analysis of referencing.

Ada provides the kind of references I am proposing, and is as fast, if not faster than C++ (My Ada Go Monte-Calro engine is slightly faster than the C++ one). So the technology to provide the kind of pointerless programming with the speed of 'C' already exists in Ada.

Another variant of the continuum between clusterfucks of tedium of Rust and C++. Refer the above linked post and also where it links to the up-thread post I made about C++:

It’s not really ambiguity. The programmer must know that functions always return r-values in Zer0. I remember that C++ has some optimizations (c.f. also and also) whereby a function can effectively return an l-value by writing the result to the caller’s container (and other optimizations with r-value temporaries and avoiding unnecessary copying), but I forget the details of that.

You can’t untangle the referential transparency issue in Haskell from the same performance problem that manifests as clusterfuck complexity and/or inflexibility in Rust, C++, or ADA, from the issue we are discussing now.

I want to preface this by saying I’m not confident that I have a complete, irrefutable model of this issue. I’m trying to form one through discussion and additional thought.

@keean wrote:

I don’t know why you are implying my proposal can’t do this. The type {[Int]} means mutable slots that have internal storage for mutable implicit containers of Int values.

I was implying that because you cannot use internal storage for an array of strings (because strings do not have a fixed length), with your proposal you cannot simply have an array of strings [String] like [Int], instead you will need to have an array of pointers to strings [*String].

I have re-read your post to which I replied. You also mentioned the slots being mutable in addition to being external or internal storage. But anyway, I will respond below to this new point.

I agree that my idea forces explicit pointers for values of type that have a dynamic size. I don’t yet see a problem with that. Also String could be a type that has a pointer to the variable sized string data in which case [String] would be okay.

You could argue that there will be a bifurcation with unions such as [Int | String] but I will retort that the compiler will complain and the programmer will change the code to [Int | *String].

@keean wrote:

The type [{Int}] is illegal because we don’t those data types with multiple type parameters to each have different immutability attributes w.r.t. to each other.

The point was that an array is just an ordinary type constrictor, we don't really know from the type alone that it is an array. As such we should look for consistency with other type-constructors. Consider the general form: T<A> we can have {T<A>} which like {Int} would represent an array that can be replaced as a whole just like we can replace the Int.

I of course considered and anticipated your quoted objection when I wrote my prior post. The mutable container {T<A>} means everything in T is mutable not just A which is of course what we want. Which is what I pointed out.

On the other hand T<{A}> would represent a type where the contents can be replaced.

Which makes no sense because it means you could have something internally inconsistent such as T<{A} | B>. Which is why it’s illegal. If you want to reference immutable elements then actually reference them: T<*{A}>. The mutability attribute has to go on the head of the type.

Perhaps you can find a flaw in my model, so please keep trying to. Actually I am expecting you to find a flaw, because I think this issue is very complex.

The point is to have a consistent mental model that can easily be internalised by the programmer.

Agreed. Which is what my idea provides and AFAICT prevents nonsense combinations.

I don’t think the complexity that you proposed would make it more difficult for the programmer.

@keean wrote:

The problem is that the pointers could point to mutable data. How do you ensure at compile time that any data sent to another process is deeply immutable? In my system you just have to look at datatype and if it is a value, you know it cannot contain any mutable data at any depth. The check is quick and easy. It literally prevents the programmer from creating structures that would be a problem. I think it is better to prevent something from even existing, than it is to check for it and throw an error.

I am not proposing that an immutable implicit container can contain immutable values which are pointers to mutable data. The immutability property is always deep.

If the programmers wants superficial “I won’t mutate this” (but no guarantees what other pointers to the same data do) then use read-only instead.

Why would we ever need a shallow immutability property? The entire point of immutability is referential transparency and value semantics. An immutable which has referential opaqueness is by definition not immutable.

@keean wrote:

The program is only sound if you have the required permissions.

AFAICT, my proposal has all the required permissions and capabilities such as casting and typeclass bounds.

The point about my proposal is it encompasses all these different requirements, and synthesises them into a coherent whole without lots of little bits. Any system must be simple and easy to explain.

I don’t view your proposal as easier but rather more complex and unnatural for programmers who know C and C++. If you think otherwise, please try to show me how and why. I think you’re trying to create some complex marriage of Haskell, Prolog, and C++. You want an explicit distinction between rvalues and containers (i.e. lvalues). I don’t see why that distinction is necessary nor desirable. C++ didn’t need it, especially not before they started to optimize the hell out of the rvalue temporaries because of the complexities of zero-cost abstraction borrowing and moves (c.f. also up-thread discussion). Rust is a clusterfuck because of that zero-cost optimization crap. I was discussing that recently.

I also wrote in other thread:

Rust and C++ are very complex because of what I wrote before:

Yet note the level of complexity in C++ in order to attain higher-level abstractions with very low-level control performance.

My proposal is basically simplified C++ before they started adding all the optimization crap we won’t need because of the highly simplified zero-cost GC that doesn’t need to involve about rvalue temporaries, that will come from our amazing insight on using Actors.

My preference is the K.I.S.S. (Keep It Simple Stupid) principle. Wikipedia may be incorrect. I was told the principle originated at General Motors corporation and it was “keep it simple, stupid” not “keep it simple, silly”.

keean commented

@shelby3

My proposal is basically simplified C++ before they started adding all the optimization crap

C++ needs that optimisation because it is low level. Ada does it all automatically. In Ada you just use and pass the values, and it optimises to move, copy, reference automatically. C++ is really not the model to follow here, and I sat that as someone who prefers programming C++ to Haskell after reading Stepanov's "Elements of Programming".

If you make pointers visible, then how data gets passed becomes your problem, and you have to worry about the semantics of it and get it right to get good performance.

If you have referential transparency, then the compiler can do the optimisations without changing the semantics of the code, and so you don't have to explicitly get the right method.

This has the added advantage that for different computers architectures you don't have to rewrite your code, because the huleuristics would be different. For example on a 32bit machine you would pass a 64bit data structure (a pair of Ints) on the stack by pointer, but on a 64bit machine you want to pass in a register. This is just one example, but the point is the heuristics for this are simple and well known on each architecture, and very suitable for compiler optimisation.

I often quote that over reliance on the magical compiler results in slowness, so I am very conservative about what is achievable by optimisations. For example I don't think that functional languages with immutable data can ever be as fast as imperative languages, because that really does require almost impossible compiler magic to refactor the algorithms.

keean commented

@shelby3

If I have some mutable data, and I include a pointer to it in an immutable data structure via a read-only pointer, the mutable data could still be changed via another read-write pointer. So you would need to distinguish a read-only pointer from a pointer to an immutable. You would then need a rule that says you can only include pointers to immutables in immutables.

In mutable data you would be able to include immutable, read-only, write-only and read-write pointers. So you have four distinct 'flavours' of pointer.

One problem with this approach is that you lose the nice compact Haskell / JavaScript notation for declaring complex objects, consider:

data Tree a = Branch (Tree a) (Tree a) | Leaf a

Doing this in C++ with pointers is much more boilerplate:

struct tree_branch {
   struct *tree_node left;
   struct *tree_node right;
}

template <typename A> struct tree_leaf {
   A value;
}

template <typename A> struct tree {
   bool isLeaf;
   union {
      struct tree_branch;
      struct tree_leaf<A>;
   } node;
}

Which is simpler and more readable?

@keean wrote:

My proposal is basically simplified C++ before they started adding all the optimization crap

C++ needs that optimisation because it is low level.

Do you fail to read the part of my prior post where I explained that Zer0 can be low-level but won’t need that optimization because it will all be achieved instead automatically by the epiphany I had (the epiphany which AFAICT you seem to be trying to take credit for).

Ada does it all automatically. In Ada you just use and pass the values, and it optimises to move, copy, reference automatically.

I already explained to you numerous times that Ada’s paradigm is inflexible and onerous. Whereas, the new paradigm (my aforementioned epiphany) for Zer0 is fully flexible (to the extent that Actors as a paradigm are not inflexible).

I grow very tired of recapitulating what I surely thought you would have assimilated by now.

If you make pointers visible, then how data gets passed becomes your problem, and you have to worry about the semantics of it and get it right to get good performance.

You failed to note that I posit that I ameliorated the performance issue. Yet then in another thread you implicitly claim you invented that amelioration, lol. This is beyond comical into the absurd.

If you have referential transparency, then the compiler can do the optimisations without changing the semantics of the code, and so you don't have to explicitly get the right method.

Haskell can’t always do the optimizations optimally. Yet as usual I am forced to repeat myself because I already made that point up-thread.

This has the added advantage that for different computers architectures you don't have to rewrite your code, because the huleuristics would be different. For example on a 32bit machine you would pass a 64bit data structure (a pair of Ints) on the stack by pointer, but on a 64bit machine you want to pass in a register.

That is a compiler optimization detail. The programmer doesn’t have to deal with that even with explicit pointers that I have proposed.

This is just one example, but the point is the heuristics for this are simple and well known on each architecture, and very suitable for compiler optimisation.

Haskell runs into other problems with optimization some of which involve how to optimally free all these temporary objects. Many of which could have been avoided if a pointer was used instead of always (deep) copying. Haskell has to deal with lifetimes and decide whether it must copy or can use a reference transparently. The way to punt on this is to use tracing GC, but that is not performant and it has pauses that bloat worse-case latency.

I often quote that over reliance on the magical compiler results in slowness, so I am very conservative about what is achievable by optimisations. For example I don't think that functional languages with immutable data can ever be as fast as imperative languages, because that really does require almost impossible compiler magic to refactor the algorithms.

Exactly! That is my point!

I was surprised that you seemed to be deviating from your past position here where you argue that Haskell can optimise all the things that we can do in C++. Empirically we know that is not true.

@keean wrote:

If I have some mutable data, and I include a pointer to it in an immutable data structure via a read-only pointer, the mutable data could still be changed via another read-write pointer.

That is illegal. You can’t include a pointer (in an immutable data structure) to mutable data.

So you would need to distinguish a read-only pointer from a pointer to an immutable. You would then need a rule that says you can only include pointers to immutables in immutables.

Correct. I thought that was already clearly implied in my prior statements.

In mutable data you would be able to include immutable, read-only, write-only and read-write pointers.

That’s not quite correct. A read-only can’t be placed into a mutable data structure. The data structure would need to also be read-only. There is no mutable data structure in my proposal. There is instead only write-only or read-write.

So you have four distinct 'flavours' of pointer[the implicit container of the value pointed at].

Correct. Immutable, read-only, write-only, and read-write.

One problem with this approach is that you lose the nice compact Haskell / JavaScript notation for declaring complex objects, consider:

data Tree a = Branch (Tree a) (Tree a) | Leaf a

Doing this in C++ with pointers is much more boilerplate:

struct tree_branch {
   struct *tree left;
   struct *tree right;
}

template <typename A> struct tree_leaf {
   A value;
}

template <typename A> struct tree {
   bool isLeaf;
   union {
      struct tree_branch;
      struct tree_leaf<A>;
   } node;
}

Which is simpler and more readable?

Note I corrected your C++ example. There’s no tree_node just struct tree.

This is a good example to consider and I am glad you raised it, because I think it clarifies our discussion.

As written in the current proposed Zer0 syntax that is very succinctly:

data Tree<A>
   Branch(left, right)          : (*Tree<A>, *Tree<A>) => Tree<A>
   Leaf(item)                   : A => Tree<A>

The Zer0 variant is more comprehensible than the Haskell one!

Actually Haskell’s obfuscation of the need for pointers to efficiently represent Tree is one of the multitude of aspects of Haskell that made it so difficult for me to understand Haskell when I was first trying to learn it. Given I taught myself how to program by first learning how the CPU works at the lowest-levels from logic gates on up when I read circa 1978 the Radio Shack book Understanding Microprocessors at age 13 when I was bed ridden for 2 days due to a severe high ankle sprain I sustained at high school football (not soccer, the one with a helmet) practice.

For me it makes no sense to obfuscate away the way the data structure is actually represented in memory. That is Haskell fairytale land where it pretends to be able to wipe away the imperative and referential details of programming, but as a result the performance can suck.

My goal is to be able to do highly performant C-like programming in Zer0 without needing a FFI. While also being able to do high-level abstraction. I don’t know if that is realistic. I’m trying to figure it out.

EDIT: Swift also has both pass-by-reference and pass-by-value.

keean commented

@shelby3

The Zer0 variant is more comprehensible than the Haskell one!

I like the idea of starting with GADT syntax which means there is only one to learn.

My goal is to be able to do highly performant C-like programming in Zer0 without needing a FFI

If the types are different from 'C' you will need an FFI, of only to supply the types needed locally. It may be possible to make a superset of 'C' types, but I think 'C' has some fundemental mistakes (like const being shallow and castable) that would be better avoided. Still it's an admirable goal.

While also being able to do high-level abstraction.

I think it's possible, that's why I started working on a programming language myself.

You might be right about pointers, the problem is we then do not have value semantics automatically for new types. You are going to have to supply copy and move constructors like in C++, unless you have a solution for that?

Edit: Also consider that 'C' does not have polymorphism for values, only things 'pointed-to' can be polymorphic. There is a very strange mix up between values and objects (that creates the need for l-values and r-value), pointers and polymorphism in 'C' that I think would benefit from separating the concerns.

keean commented

@shelby3 I wonder if there is a way to combine both our concerns. I think having a shorthand for object creation is very useful. For example which is clearer:

x := new Array(1)
x[0] = 3

Or

x := [3]

I think variables should be immutable by default, so:

x : Int := 3

Would be immuatble, and always an r-value

y : *Int := {3}

Would be a mutable pointer to an object. This problem with this is that pointers don't have extents (lengths) and objects do, so it's probably better to have a separate type for this like so:

y : {Int} := {3}

Which would be an object, which is a pair consisting of a pointer and a length, so essentially (*Int, 1), where the '1' is a type level integer encoding the size of the object. This means the type system would have kinds Type and Nat. So adding kind annotations to the type we would have {Int} == (*Int : Type, 1 : Nat). The point here is to allow assignment to know the size of the object to copy, as part of the type.

So for objects we want this pair of address and length, rather than just a plain 'C' pointer.

@keean wrote:

I think having a shorthand for object creation is very useful. For example which is clearer:

x := Array(1)
x[0] = 3

Or

x :: [3]

The proposed Zer0 syntax has both options. But I hope most programmers would opt for the latter by convention. Note the latter can be immutable. I corrected your syntax.

Note the constructor could instead be coded to be overloaded with default, named argument parameters (the named, default parameters are currently in the proposed syntax):

x := Array(length = 1)
x[0] = 3

Or:

x :: Array(3)

The latter requires some means of handling a variable sized argument list. The last non-default argument of the tuple of the function declaration has to require the Iterable and Varargs typeclasses. The defaults after the one with the Varargs requirement must be named and applied by name.

I think variables should be immutable by default, so:

x :: 3      : Int

Would be immuatble, and always an r-value

Again I corrected your syntax to what is currently proposed for Zer0.

That is not a r-value because it occupies a non-ephemeral location in memory and its address (&) is allowed to be assigned to a pointer.

Edit: Also consider that 'C' does not have polymorphism for values, only things 'pointed-to' can be polymorphic. There is a very strange mix up between values and objects (that creates the need for l-values and r-value), pointers and polymorphism in 'C' that I think would benefit from separating the concerns.

What is wrong with it? I like it. Does it cause a problem?

I have not yet seen a clearly articulated benefit of separating things the way you seem to want to.

Seems you want to conflate the notion of r-value with l-value. I do not clearly see how you are separating the concerns. You want a value to be some abstract thing that is stored nowhere, thus an r-value. And you want to be to able to tag identifiers with the type of being a r-value, which is normally only for l-values. Whereas, I think r-value is a contextual concept that should never be conflated with the type system.

Apparently AFAICT the main problems that the implicit r-value concept created on C/C++ is that in order to optimize them for the purpose of for example not creating duplicated temporary r-values and making their deallocation efficient, C++ had to create all this complexity which I mentioned up-thread. But I have proposed the bump-pointer-heap “actors epiphany” so as to entirely side-step the complexity of C++ yet with hopefully nearly equivalent performance.

It’s interesting though to consider how would my proposed model transpile to Scala instead of Go? Go has pointers but Scala has only objects which are always referenced (and some special cases where for example array buckets are not boxed).

When Go is transpiled to JavaScript by GopherJS, Go’s pointers are converted to unique integers which are used with a hashmap to lookup the references to the objects. Very inefficient.

Perhaps what we could do instead is that pointers are the references. When assign the value instead of the pointer, then we copy as necessary. So GopherJS decided to make pointers less efficient to avoid all that excess copying and also because there’s no copy constructor concept in Go.

Remember that same as for Go, I am proposing that Zer0 will not allow any pointer arithmetic.

You might be right about pointers, the problem is we then do not have value semantics automatically for new types. You are going to have to supply copy and move constructors like in C++, unless you have a solution for that?

Good point. Yes we will need copy and move constructors. And as you pointed out in another thread, these must terminate. Note the linked example would instead be in Zer0:

string b(&x + &y);                                // Line 2

I’m proposing that in Zer0 we will be explicit about when we’re referring to references or values. Values are always copied. Of course if x and y are already pointers then no need for the &.

My goal is to be able to do highly performant C-like programming in Zer0 without needing a FFI

If the types are different from 'C' you will need an FFI, of[or] only to supply the types needed locally. It may be possible to make a superset of 'C' types, but I think 'C' has some fundemental mistakes (like const being shallow and castable) that would be better avoided. Still it's an admirable goal.

Indeed immutability should be deep and can’t be castable. Otherwise immutability is a not an invariant that can be relied on. And the immutability invariant is essentially for scaling parallelism with shared data in the Actor model as we have discussed in the Parallelism issues thread #41.

y : *Int := {3}

Would be a mutable pointer to an object.

That is not proposed Zer0 syntax and I do not think I like that proposal.

Instead I am now proposing:

y :: &new 3    : *{Int}   // Immutable pointer to a mutable `Int`
y := &new 3    : {*{Int}} // Mutable pointer to a mutable `Int`
y := &new 3    : {*Int}   // Mutable pointer to an immutable `Int`
y :: &new 3    : *Int     // Immutable pointer to an immutable `Int`

The prefix/infix new should call the default constructor for the type of its RHS operand.

That new is necessary to avoid an implicit conversion below that would be ambiguous intent (is it a programmer error or intentional?) because r-values have no address:

y :: &3    : *{Int}   // Immutable pointer to a mutable `Int`
y := &3    : {*{Int}} // Mutable pointer to a mutable `Int`
y := &3    : {*Int}   // Mutable pointer to an immutable `Int`
y :: &3    : *Int     // Immutable pointer to an immutable `Int`

So to model a primitive type that has value semantics (i.e. copy the value instead of the reference) by-default such as integers in Scala (Java) or JavaScript requires we put them in an object, e.g. the Integer type instead of int in Java and the {v:3} or [3] value in JavaScript which as an Object uni-type (JS has no static types).

This problem with this is that pointers don't have extents (lengths) and objects do

That is another evidence that your proposal is also conflating concerns. My proposal has an implicit container for pointers also when they’re l-values. Pointers are just another value. You have to treat them special in your proposal— i.e. they’re a corner case in your proposal.

Note Scala and JavaScript don’t have pointers thus they don’t have lengths. But I already explained in this post my ideas for the mapping from my proposal to those pass-by-reference models.

Obviously if we transpile Zer0 to Scala, then we can’t actually have features in Zer0 to manipulate memory directly such as with pointer arithmetic. But as pointed out in that article critical of C, we don’t want to expose that model of memory because it doesn’t reflect the reality of the hardware well anyway.

Data types need to provide serialization and deserialization functions which deal in packed binary outputs and inputs respectively.

keean commented

@shelby3
I don't like the new in the syntax, I like the JavaScript approach with literal JSON syntax, so that {a:3} is a new object.

In my proposal variables are not reassigable, so that simplifies the number of alternatives available:

y := 3 : Int     // immutable value
z := {3} : {Int}    // mutable object

No types are needed however because of type inference.

In both cases the variable is not reassignable. In the first case the variable y has the value '3', in the second z is the reference/pointer+extent to the new object containing the value 3.

All operators that do reassignment expect a reference/pointer+extent on their left hand side, so the following:

y = 7 // error
z = 7 // okay
z += 2
z *= 3
etc...

w := y // w is a copy of the value y
x := z // x references the same object as z
p := {y} // p is a reference to a copy of the value  y
q := |z| // q is a copy of the value in z.

The drawback is that assignment from one object to another might be confusing:

a := {2}
a = z // this is an error
a = |z| // okay

This is due to 'a' not being reassignable. To create a reassignable reference you would need:

var a : {{Int}}
a = z
|a| += 1
|a| *= 2

Which might be too different for the mainstream. Perhaps allowing reassignment is better, or doing what JavaScript does and simply require objects to have object property names like:

var a : {v:{v:2}}
a.v = z
a.v.v += 1
a.v.v *= 2

I still don't like the 'new' though.

keean commented

Refresh

I wrote:

It’s interesting though to consider how would my proposed model transpile to Scala instead of Go? Go has pointers but Scala has only objects which are always referenced (and some special cases where for example array buckets are not boxed).

When Go is transpiled to JavaScript by GopherJS, Go’s pointers are converted to unique integers which are used with a hashmap to lookup the references to the objects. Very inefficient.

Perhaps what we could do instead is that pointers are the references. When assign the value instead of the pointer, then we copy as necessary. So GopherJS decided to make pointers less efficient to avoid all that excess copying and also because there’s no copy constructor concept in Go.

Remember that same as for Go, I am proposing that Zer0 will not allow any pointer arithmetic.

You might be right about pointers, the problem is we then do not have value semantics automatically for new types. You are going to have to supply copy and move constructors like in C++, unless you have a solution for that?

Good point. Yes we will need copy and move constructors. And as you pointed out in another thread, these must terminate. Note the linked example would instead be in Zer0:

[…]

Instead I am now proposing:

y :: &new 3    : *{Int}   // Immutable pointer to a mutable `Int`
y := &new 3    : {*{Int}} // Mutable pointer to a mutable `Int`
y := &new 3    : {*Int}   // Mutable pointer to an immutable `Int`
y :: &new 3    : *Int     // Immutable pointer to an immutable `Int`

So I’m trying to think how we can model those variants in Scala without needing to implement a type-checker for Zer0. Just do only a syntax-level transformation in order to accelerate the completion of a prototype or proof-of-concept just shy of a minimum viable product.

If Zer0 has no type checker, the transpiler strategy must change considerably from the ideal I proposed as quoted.

Scala does not have a immutability annotation for a container. Instead it requires the var or val on the data fields of each class or trait data type, e.g.:

object HelloWorld {
  def main(): Unit = {
    case class T(var t : String)
    val a = new T("a")
    var b = a
    a.t = "b"
    println(b)  // prints "T(b)"
  }
}
HelloWorld.main()

The val a only prevents rebinding the reference such as a = b would generate an error. To prevent the above from compiling, we need instead:

    case class T(val t : String)

So the way to model that is to make a different named variant of T for each of the cases of types in Zer0:

case class      im_T(val t : String)             // models `T`
case class      rw_T(var t : String)             // models `{T}`
// Next two are never allow because they make no sense.
// Read-only and write-only make sense only as pointer types to values owned by another.
//case class      ro_T(val o : Either[im_T, rw_T]) // models `<-T` aka `←T`, `⎙T`, `📖T` or `👀T`¹
//case class      wo_T(var o : rw_T)               // models `->T` aka `→T`, `✍T`, `✎T` or `⌨T`
case class im_p_im_T(val o : im_T)               // models `*T`
case class im_p_rw_T(val o : rw_T)               // models `*{T}`
case class im_p_ro_T(val o : Either[im_T, rw_T]) // models `*←T` or `*(←T)`
case class im_p_wo_T(val o : rw_T)               // models `*→T` or `*(→T)`
case class rw_p_im_T(var o : im_T)               // models `{*T}`
case class rw_p_rw_T(var o : rw_T)               // models `{*{T}}`
case class rw_p_ro_T(var o : Either[im_T, rw_T]) // models `{*←T}` or `{*(←T)}`
case class rw_p_wo_T(var o : rw_T)               // models `{*→T}` or `{*(→T)}`
// Next two sets are never allow because they make no sense.
//case class ro_p_im_T(val o : Either[im_p_im_T, rw_p_im_T]) // models `←*T` or `←(*T)`
//case class ro_p_rw_T(val o : Either[im_p_rw_T, rw_p_rw_T]) // models `←*{T}` or `←(*{T})`
//case class ro_p_ro_T(val o : Either[im_p_ro_T, rw_p_ro_T]) // models `←*←T`, `←*(←T)`, `←(*←T)` or `←(*(←T))`
//case class ro_p_wo_T(val o : Either[im_p_wo_T, rw_p_wo_T]) // models `←*→T`, `←*(→T)`, `←(*→T)` or `←(*(→T))`
//case class wo_p_im_T(val o : rw_p_im_T)          // models `→*T` or `→(*T)`
//case class wo_p_rw_T(val o : rw_p_rw_T)          // models `→*{T}` or `→(*{T})`
//case class wo_p_ro_T(val o : rw_p_ro_T)          // models `→*←T`, `→*(←T)`, `→(*←T)` or `→(*(←T))`
//case class wo_p_wo_T(val o : rw_p_wo_T)          // models `→*→T`, `→*(→T)`, `→(*→T)` or `→(*(→T))`

All pointer types delegate to the object o. The p_ro_T and p_wo_T delegate to the object o except don’t allow setting and getting respectively. Perhaps we can use the @inline annotation on the delegate methods.

So we can define implicit functions:

From To Action
im_T rw_T create new deep copy rw_T
rw_T im_T create new deep copy im_T
im_p_im_T rw_p_im_T create (not deep) copy rw_p_im_T
im_p_rw_T rw_p_rw_T create (not deep) copy rw_p_rw_T
im_p_ro_T rw_p_ro_T create (not deep) copy rw_p_ro_T
im_p_wo_T rw_p_wo_T create (not deep) copy rw_p_wo_T
rw_p_im_T im_p_im_T create (not deep) copy im_p_im_T
rw_p_rw_T im_p_rw_T create (not deep) copy im_p_rw_T
rw_p_ro_T im_p_ro_T create (not deep) copy im_p_ro_T
rw_p_wo_T im_p_wo_T create (not deep) copy im_p_wo_T

We can also define addressOf functions (not implicit):

From To Action
im_T im_p_im_T create new im_p_im_T; no need to copy
im_T rw_p_im_T create new rw_p_im_T; no need to copy
im_T ro_p_im_T create new im_p_im_T and ro_p_im_T; no need to copy
im_T wo_p_im_T create new rw_p_im_T and wo_p_im_T; no need to copy
im_T im_p_ro_T create new im_p_ro_T; no need to copy
im_T rw_p_ro_T create new rw_p_ro_T; no need to copy
im_T ro_p_ro_T create new im_p_ro_T and ro_p_ro_T; no need to copy
im_T wo_p_ro_T create new rw_p_ro_T and wo_p_ro_T; no need to copy
rw_T im_p_rw_T create new im_p_rw_T; no need to copy
rw_T rw_p_rw_T create new rw_p_rw_T; no need to copy
rw_T ro_p_rw_T create new im_p_rw_T and ro_p_rw_T; no need to copy
rw_T wo_p_rw_T create new rw_p_rw_T and wo_p_rw_T; no need to copy
rw_T im_p_wo_T create new im_p_wo_T; no need to copy
rw_T rw_p_wo_T create new rw_p_wo_T; no need to copy
rw_T ro_p_wo_T create new im_p_wo_T and ro_p_wo_T; no need to copy
rw_T wo_p_wo_T create new rw_p_wo_T and wo_p_wo_T; no need to copy

P.S. In the current proposed Zer0 syntax then an impure function is ~> not -> (because in my mind an impure function doesn’t go straight as it has collateral damage side-effects). A pure function is => (because pure functions are like equality without imperative action), so thus the -> was free for use as write-only.

¹ 📖T or 👀T

@keean wrote:

I don't like the new in the syntax, I like the JavaScript approach with literal JSON syntax, so that {a:3} is a new object.

In my proposal variables are not reassigable, so that simplifies the number of alternatives available:

y := 3 : Int     // immutable value
z := {3} : {Int}    // mutable object

But that doesn’t work when y is a pointer type. You’re conflating immutability with pointer types. You’re conflating now your proposal (with no pointer types and no implicit l-value/r-value distinction) and my proposal that has pointer types and implicit l-value/r-value. So sorry your post is just not making any sense. I don’t intend that to be condescending at all.

I need the new because I am informing the compiler to make a new l-value from an r-value. Your proposal doesn’t have l-value and r-value. You can’t mix your proposal and my proposal.

I like new because it says make a new l-value. Literals are r-values not l-values.

Note the new is not needed when an l-value is not need from an r-value. So the new will rarely appear in code I think.

I keep telling you that I am copying what I like from C/C++. You’re trying to do something different. We can’t mix the two approaches. We have to choose one or the other.

If you want to show me how your proposal is better and show how to model it in Scala without a type checker, then please feel free.

I wrote in another thread:

Note Pony adds iso, trn and tag pointer types to the ones we had contemplated so far.

[…]

The iso and trn can be moved (aka alias burying) to any other above type because they’re exclusive ownership of writing.

One huge drawback I see in that methodology of allowing more flexibility to share alias burned former mutables between Actors (i.e. moving ownership) is that AFAICT it defeats the software cache coherency advantage I invented in the Parallelism issues thread #41. So AFAICT their model can’t enable massively scaled multicore.

There’s no utility for Zer0 to have such exclusive writable types on non-ARC objects, because these can’t be shared without copying them to ARC. Zer0 could adopt the iso and trn on ARC objects, but as aforementioned that might break the software cache coherency I invented for massively scaled multicore. Yet the exclusive reading of aspect of iso for immutable ARC objects could possibly prove useful to eliminate the need to trace those for cyclic references because only one pointer could exist.

@keean that reading facet of iso would enable us to prove statically that there are no cycles because it would be impossible to assign an iso unless moving to it (i.e. end the lifetime of the pointer from whence it was sourced). So that is roughly equivalent to your point that if we don’t allow references to immutables then immutables can never have cycles. Yet it allows us to retain non-iso references to immutables also. And iso doesn’t have to be immutable in order to insure no cycles, although we may need it to be for the reason I stated as quoted.

keean commented

@shelby3

I need the new because I am informing the compiler to make a new l-value from an r-value.

I don't think so, the object literal implicitly creates a new object like in JavaScript, noice:

let x = {v: 1}
let y = {v: 1}
x === y // false, x and y are different objects.

@keean wrote:

I need the new because I am informing the compiler to make a new l-value from an r-value.

I don't think so, the object literal implicitly creates a new object like in JavaScript, noice:

let x = {v: 1}
let y = {v: 1}
x === y // false, x and y are different objects.

You overlooked this point I made about conflating mutability:

z := {3} : {Int}    // mutable object

But that doesn’t work when y is a pointer type. [Otherwise for z] You’re conflating immutability with pointer types

keean commented

@shelby3

The instance can set the type when it ever it changes to a different closed union. Thus the compiler can generate it.

I propose the hypothesis that something must have an address to be mutable.

@keean please delete your prior post and put it in the Typeclass objects thread where that discussion is occurring. My post which you quoted is in the Typeclass objects thread.

@keean actually I see what you did is your quoted me from Typeclass objects thread, but you’re actually intending to reply to my prior post in this thread. You may want to edit your post to correct the quote.