new wd-40 test

Question

new wd-40 test

Opened this issue 4 years ago · 602 comments

new wd-40 test

Answer 1 · 2020-06-26T20:38:39.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-315910415
Original Date: Jul 17, 2017, 6:06 PM CDT

not shared_ptr as the default over GC

inferior except prompt/deterministic reference counting smart pointer paradigm

deterministically destruct instances of a type that has a destructor¹ which is a major feature lacking from GC languages … noting that such types can not used with GC references directly or indirectly as a member of a data structure that is GC referenced

We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:

@shelby3 wrote:

However, reference counting deallocation is prompt and more deterministic than GC mark-and-sweep (although I rebutted there, “But please note that reference counting can’t break cyclical references but GC can. And reference counting can cause a cascade/domino effect pause that would not be limited to a maximum pause as hypothetically V8’s incremental mark-and-sweep GC.”). Incremental GC schemes when not overloaded with allocation that escapes generational GC, decrease high-latency stalls.

Has it ever been tried to combine reference counting with GC? Apparently so:

Edaqa Mortoray wrote:

Reference counting with cycle detection adds an interesting twist. I believe both D and Python do this. In order to eliminate memory loops objects are “sometimes” scanned for reference loops. This involves a lot more work than simply touching a reference count — though still not nearly as much as scanning all allocated memory. The impact of this loop detection depends heavily on how often it is scanned, and what the triggers are.

Given our single-threaded model, we do not need atomics on the reference counting and given the proposal of the OP to emphasize borrowing and RAII then we would not get the benefits of a generational GC scheme. So reference counting would give us determinism except in the case of cyclical references and then GC would kick in to break cycles. However, when we know the cyclical references are probable (e.g. any kind of non-acyclic graph such as the DOM or a doubly-linked list), then it is wasted overhead to employ reference counting, although that overhead could be very tiny (i.e. infrequent tracing) if we only want to catch unexpected cycles (i.e. as an insurance policy and we’d want to log these and consider them bugs).

I suppose we could rate-limit reference counting finalization, to prevent the rare aberrant high latency stall.

@fauigerzigerk wrote:

@rbehrends wrote:

It is well-known that the amortized cost of naive reference counting is considerably higher than that of a modern tracing garbage collector (regardless of whether you also support cycle collection). It is also well-known that non-naive RC-based approaches generally make it easier to achieve lower pause times (though cycle collection may throw a spanner in the works).

True, but I think these comparisons are often done under the assumption that memory utilisation remains fairly low. Tracing GCs react very badly to growing memory utilisation in my experience. This is of course ultimately an economic question and maybe it should be modelled that way.

@SKlogic wrote:

@jules wrote:

@jules wrote:

Tracing GC is not always faster. GC time is proportional to the size of the allocated memory, reference counting time is not.

It's not about running out of memory altogether, it's about the heap filling up with dead objects while the GC doesn't yet know that they are dead. Of course things are great when they are great and the GC can keep up with the mutator. Yet failure does happen whether or not you think it should happen, hence all the efforts that I mentioned that are trying to fix the issues.

As I said, in such a case RC would behave even worse.

@david-given wrote:

Regarding latency spikes: that's not entirely true. If you drop the last reference to an object, then freeing that object can cause the last reference to other objects to be dropped, etc.

So it's possible that dropping a reference can cause a large amount of work, which if it happens during a critical path can cause a latency spike. Particularly if finalisers are involved. If your object ownership graph is straightforward this isn't a problem, but if you have large objects with shared ownership it's very easy to be surprised here.

@SKlogic wrote:

It is really hard to get real time guarantees with RC - given that time for cleaning up depends on a size of a structure.

Just imagine it freeing a 10Gb linked list.

Real time pure mark and sweep GCs are easier.

@rbehrends wrote:

Just keep in mind the trade-offs that you are making.

For starters, you are adding a considerable performance overhead to local pointer assignments; you're replacing a register-register move (and one that can be optimized away in many situations by a modern compiler through register renaming, turning it into a no-op) with a register-register move plus two memory updates plus a branch. Even where you have cache hits and the branch can be predicted correctly almost always, you're dealing with cache and BTB pollution; these effects may not fully show up in micro-benchmarks, but only completely manifest in large programs.

@barrkel wrote:

There's no discussion of thread safety and how that affects reference counting. That's probably the biggest problem; you need to use memory barriers to make reference counting safe with multiple threads, and that can kill performance.

@vardump wrote:

Reference counting ~~is fast with one CPU core. But it~~ gets worse and worse as you add cores. Inter-CPU synchronization is slow and the buses can saturate. With enough cores, at some point that'll be all the system is doing

@iofj wrote:

@rbehrends wrote:

It is well-known that…

I submit this is only true for equivalent number of things to keep track of. In practice this is not the case. Languages with GC [typically] go completely overboard with GC, using it completely everywhere, even when not strictly necessary and certainly java does.

In C++, if you have a map with items, you use move semantics and you have have either 0 or 1 refcounts to keep track of, the one for the map itself. The rest is still "refcounted" but without ever touching any integer, by the normal scoping rules. Same goes for Rust code. That's ignoring the fact that a Java object is, at minimum, 11 bytes larger than a C++ object. Given java's boxing rules and string encoding, the difference in object sizes becomes bigger with bigger objects. Because out-of-stackframe RTTI is a basic necessity of tracing garbage collectors this is unavoidable, and cannot be fixed in another language. Bigger object sizes also mean more memory needed, more memory bandwidth needed, ... And Java's constant safety checks also mean

In Java, the same map will give the GC 3 items to keep track of (minimum) per entry in the map, plus half a dozen for the map itself. One for the object that keeps the key and the value, one of the key and one for the value. That's assuming both key and value are boxed primitives, not actual java objects. In that case, it'll be more.

@majke wrote:

Having that in mind, there are couple of problems with refcnt:

Cache spill. Increasing/decreasing a counter values in random places in memory kills cache. I remember one time I separated the "string object" data structure in python away from the immutable (!) string value (python has usually C struct object followed in memory by value blob). The python program suddenly went way faster - immutable (readonly) string values in memory were close to each other, mutated refcnts were close to another. Everyone was happier

For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages. If the number of available unused bits is insufficient to count all allocated objects, then perhaps each compiled object type could employ a separate array (i.e. a zero-runtime-cost compile-time array selection), presuming the cache thrashing issue is due to intra-page-size fragmentation, not lack of inter-page contiguity. Also then we do not need to create the array element until the second reference. This is applicable when mostly passing around objects and not accessing all the data of the object as frequently.

Weighted reference counting can be employed to reduce the accesses to the reference count by half (since the source pointer has to be accessed any way), but it requires some unused bits in the pointer to store the weight.

David Ireland wrote:

Don’t get me wrong, Apple must have had a good reason to choose reference counting for memory management. That reason is actually pretty obvious: Objective-C is not a safe language, so you can’t move objects. If you can’t move objects, You have to use something like malloc to minimise fragmentation: you pay a big up front cost to allocate an object in the right place. In most GC schemes, allocation is free – you just increment a pointer. This is fine, because you are going to iterate through all your objects every now and again, and it doesn’t cost a whole lot more to move some of them around at the same time. You get to compact away all the fragmentation this results in. This makes GC fairly miserly with memory too. In iOS , if you want to allocate something bigger than the largest free fragment, you die. With GC, you have to wait a while for the GC to compact everything, after which memory will be compacted 100% efficiently and there will be a single memory extent as big as all available memory. Objective-C can’t have any of these benefits, so Apple chose a different route for iOS, and put the best possible spin on it.
…
Mobile apps won’t be fast until browsers are re-written in a fast, safe programming language that can share a good GC with ECMAScript. Of course browsers won’t be secure until then either, but I’m not holding my breath. There is no such language that is sufficiently popular, and Google fluffed it’s lines with Go. Maybe ECMAScript implementations will gain so much in performance, that all code that manipulates the DOM will be written in ECMAScript, and we can finally use a high performance GC.

Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.

Answer 2 · 2020-06-26T20:38:42.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-315958730
Original Date: Jul 18, 2017, 12:02 AM CDT

@anonymous wrote in private:

Progress on the language design is great to see.

I need progress on code, not language design, but I am trying to have paradigm I can be thinking about when structuring my code in TypeScript + C, so that it can be an easy transition if ever we complete a new PL and also to drive my insights on that potential language.

Concretely, I think it means I will write wrapper classes with getters and setters in TypeScript that take an ArrayBuffer for data and do my malloc in Emscripten and eliminate marshalling that way. Which will be a precursor to what a compiler can do automatically (and later with better integration by outputting ASM.js or WASM directly and maybe one day our own VM since the browser VMs have limitations).

Answer 3 · 2020-06-26T20:38:44.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-325094328
Original Date: Aug 26, 2017, 1:40 AM CDT

Rust’s leadership may be falling into the political hell hole.

Look what Eric wrote about Rust:

http://esr.ibiblio.org/?p=7711&cpage=1#comment-1913931
http://esr.ibiblio.org/?p=7724#comment-1916909
http://esr.ibiblio.org/?p=7303
http://esr.ibiblio.org/?p=7294
http://esr.ibiblio.org/?p=7294#comment-1797560
http://esr.ibiblio.org/?p=7294#comment-1797743
http://esr.ibiblio.org/?p=7711&cpage=1#comment-1911853

They also argue (similar to a point I made to @keean) that type system complexity can kill the economics of a language (this complexity budget has to be minimized):

http://esr.ibiblio.org/?p=7724#comment-1912884

And everyone agrees OOP (virtual inheritance and subtyping) are anti-patterns:

http://esr.ibiblio.org/?p=7724#comment-1912640

Comments about where Rust has overdrawn it’s complexity budget:

Answer 4 · 2020-06-26T20:38:47.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-345494394
Original Date: Nov 18, 2017, 11:58 PM CST

I wrote:

We need optional GC and reference counting. The latter excels at deterministic, prompt finalization (which is required for non-main memory finalization) but has the trade-offs quoted below:

[…]

Edit Nov. 18, 2017 Distinguish between non-deterministic reference counting and deterministic RAII.

Re-reading the prior linked comments…

First realization is that reference counting is non-deterministic and is inferior to GC in every facet (for the applicable use cases of non-deterministic resource management and thus the inherent lack of deterministic finalization) except for interop with non-GC objects as explained below (but potentially at the cost of loss of compaction and loss of automatic cyclical references memory leak detection).

Afaics, a useful facility to add in addition to GC is the zero-cost¹ abstraction of deterministic RAII block scope allocated/deallocated on the stack, because it (probably improves performance and) adds determinism of finalization (aka destructors) and interacts with the optimization of low-level performance as explained below.

To achieve such zero-cost resource management abstractions requires typing/tracking the lifetimes of borrows of references so the compiler can prove a resource has no live borrow when it goes out of block scope. I had proposed the compile-time enforced “is stored” annotation on function inputs when an input will be (even transitively) borrowed beyond the lifetime of the function application. A separate concern from the issue of tracking borrows ^{_{(a la Rust’s total ordering which assumes multithreaded preemption everywhere or my proposed partial ordering in conjunction with singled-threaded event model)}} for the purpose of preventing concurrent race conditions access (which afair we discussed in depth in the Concurrency thread). This granular typing of the context of side-effects seems more straightforward than some grand concept of typing everything as a (a cordoned context of imperative) side-effect??²

One of the key advantages/requirements of GC is the ability to move objects so that memory can be periodically compacted ^{_{(with the compaction being amortized over many implicit/delayed deallocation events which would otherwise be explicit/immediate/non-deterministic-latency with reference counting)}}, which enables allocation to be as low-cost as incrementing a pointer to the end of allocated objects in a contiguous memory address space.

As an aside, for highly optimized low-level code to run at maximum performance, it must be possible to freeze the position of objects in the virtual memory address space, else the objects must be copied. The former has a concurrency live and dead-lock issue w.r.t. the said compaction.

For RAII managed data structures that contain pointers to data (i.e. unboxed) which is not RAII managed at the current or containing block scope (i.e. for GC pointers since pointers to data RAII managed in a contained block scope would exhibit use-after-free errors), these pointers have to be mutable by the compaction engine and have to be recorded in the list of references which the GC must trace. To have full interoperability between GC instances and RAII (stack allocated data structure) instances without copying data, incurs either a performance penalty of reloading pointers on each access (in case they’ve been moved by the compactor) else the aforementioned concurrency live and dead-lock issue w.r.t. the said compaction (which is an onerous, undesirable can-of-worms).

Thus, I’m leaning towards that stack allocated data structures should not interop with GC instances, except via copying because (other than deterministic finalization) performance is their primary raison d'être. And interop with GC would (as GC does) also encourage unboxed data structures, which are known to thrash caches and degrade performance. Additionally, I think it should be allowed to have GC instances which are unboxed data structures (e.g. transpiled to ArrayBuffer in JavaScript) so they can be most efficiently copied (i.e. no transformation overhead). One might ponder whether to also have reference counted allocation on the heap , so that these immovable objects would not need to be copied when accessed by low-level performant algorithms ^{_{(and note that the reason ASM.js didn’t allow very small ArrayBuffers is because there’s no presumption that what was compiled to ASM.js did bounds checking, although not a hindrance because our compiler could do static bounds checking in many cases and insert dynamic checks in the others)}}, but the loss of compaction and the lack of automatic cyclical references detection are onerous. Any attempt to overcome these onerous trade-offs with “stop the world” of the code that is operating on non-GC instances, would suffer from “a concurrency live and dead-lock issue” as well (because think about it that just stopping the threads is not sufficient as the state can be invalid when the threads are restarted, instead they must be stopped at points where the current thread state has no such dependencies).

However, is memory compaction even necessary on 64-bit virtual address spaces wherein the hardware can map the virtual address pages to compacted, available physical pages? Well I guess yes unless most objects are much larger than the ~4Kb page size.

However, afaics perhaps the compaction issue could be significantly mitigated by allocating same size objects from contiguous virtual address space significantly larger than ~4KB in reference counting algorithm. Thus allocations would be first applied to deallocated slots in the address space, if any, per the standard reference counting allocation and deallocation algorithm, which employs a linked-list of LIFO free slots.³

If compaction is required, then I contemplate whether to allow unsafe low-level code which manually employs malloc and free should still be allowed for cases where there’s no other way to achieve performance with dynamic memory allocation.

¹ _{Zero runtime cost because due to the compile-time determinism, we know at compile-time when the deallocation will occur.}

² _{How does Oleg’s insight conflict with @keean’s prior stance (in our discussions) against AST macros? I vaguely remember that he was against macros because they obfuscated debugging and the type system.}

_{Is this the paper? http://okmij.org/ftp/Computation/having-effect.html#HOPE}

_{I’m unable to readily grok Oleg’s work because he communicates in type theory and Haskell (neither of which am I highly fluent). Afaics he explains with details/examples instead of conceptually and generatively from ground level concepts. And I do not have enough time to apply arduous effort. So he limits his audience to those who are fluent in every facet of the prior art he builds on and/or who can justify the time and effort.}

_{Essentially all side-effects distill to the generative essence of shared state. If state is not shared, then there are no interactions of effects between those processes.}

_{Afaics, a Monad controls the threading of access to an orthogonal context across function application. That orthogonal context can be state in the State monad. Correct? That is why Monads can’t generally compose because they cordon orthogonal contexts.}

_{But what is the goal to achieve? If we model higher-level concepts as imperative building blocks with contexts, we can do unlimited paradigms, but not only will our code become impossible to read because of the overhead of the necessary syntax and type system to allow for such generality (which was essentially my complaint against trying to model everything, e.g. even subtyping, as typeclasses as you were proposing before), but we have an explosion of issues with the interaction of too many paradigms. Afaics, this isn’t headed towards a simple language.}

_{There will not be a perfect language. Type theory will never be complete and consistent. Design is driven by goals.}

_Thoughts?

³ _{Richard Jones. Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm}

Answer 5 · 2020-06-26T20:38:50.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-345981613
Original Date: Nov 21, 2017, 4:20 AM CST

If i may :)

For me the main problem i would have with garbage collector is that they are harder to debug and test.

The nightmare i have is you will test the functionality on simple program, then it works, and you keep adding functions, race conditions, and two month latter you realize there is a memory leak or memory corruption somewhere, and you're good for days or weeks of debugging to trace the problem.

With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.

With garbage collector who will parse the whole tree every now and then, it will traverse thousands of pointer allocated over few minutes of running, and it can become much harder to track where the problem is if it free a memory that it shouldn't, or if it doesn't free a memory that it should.

Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.

And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time, and the host program might have no clue at all of what the plugin is doing with the reference passed to it. And it's the same problem with distributed application.

For me now when i want to add new feature, especially for memory management, my first concern is not only 'how to make it work', but also how easy it will be to detect problems, and to track what cause the problem.

With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.

With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.

For me to make GC programming easy, it needs something like android SDK where all the application is structured at macro level with activities, intents, special class for asynchronous/background execution, and making in sort that lifetime of all objects are predictible due to the structuring of the application, declaration of resources in XML etc.

https://developer.android.com/guide/components/activities/process-lifecycle.html

They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.

It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.

Making general purpose GC to deal with complex language with multi layered scope and asynchronous execution without giving any hint to the pattern that the object is likely to follow by inheriting some class, and structuring the application with built class with predictible behavior seems very hard.

Reference counting is much easier to deal with for the general case.

Answer 6 · 2020-06-26T20:38:52.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346013299
Original Date: Nov 21, 2017, 6:34 AM CST

My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have proceedures not functions then to return some data you simply pass a collection (set/map/array etc) into the proceedure.

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

Answer 7 · 2020-06-26T20:38:54.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346107470
Original Date: Nov 21, 2017, 11:51 AM CST

@NodixBlockchain (aka IadixDev @ BCT) wrote:

If i may :)

I can’t speak for @keean (and this is his project area but this is my thread), but personally I’d prefer “please don’t” (or post once for every 10 posts by others in this thread, so as to keep the S/N ratio of the thread higher), i.e that you would comment in your thread “Saluations! :)” instead (or at least not in this/my thread), unless you have something new+important+accurate+well-researched to point out which you have not written already in the past.

If you quote from this thread, and post in your thread, I will probably read (presuming I have time) and perhaps even reply there. I’m trying to avoid this thread becoming another 300+ posts that I can’t wade through when I want to refresh my own memory of what my (and @keean’s) analyses was/is.

@keean and I have been doing backtalking on LinkedIn to reduce clutter in these issues thread. You’re welcome to add me there.

With reference counting, it's very easy to trace problem, because you know where the freeing must happen, the point where the last reference to an object is released can be tracked easily, and it make it very easy to track problems, either they are memory corruption due to memory being released prematurely , or memory leaks.

I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.

With reference counting, it's easy to make a break point when reference get freed, and having the debugger to pause everything and inspect all the context at the moment where it happens.

With GC, the analysis is delayed, and all the context of execution that change the state of the object is lost, so it makes it very hard to debug.

You can’t see which/where are all the references to the shared object that had it’s reference count decremented, because it’s the implicit nature of non-deterministic memory allocation to not know which pointers would be the candidates and the reference counting scheme provides no back pointers from the said object to said sharing pointers (as aforementioned).

I agree it wouldn’t normally be possible to set a runtime breakpoint on the change in number of references to a specific object with a tracing GC, because the assignment of pointers doesn’t touch the referenced object (although write-guards for generational GC might get you as close as the virtual page but that might not be helpful). Thus, there’s no way to know which assignment to set a breakpoint on for testing the address of the said shared object. Yet comparably, the more optimized variants of reference counting also have the same debugging problem because they don’t even update the said common object (as explained below) because they keep the reference count in the pointers (and additionally there is the Weighted Reference Counting). In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.

And if you need to have a system of plugin, like code that are loaded at runtime that can manipulate references itself, it can become also impossible to track deterministically the reference tree at compile time

It seems you’re presuming that reference counting models deterministic memory allocation, but that is not generally the case and (per my reply to @keean below), if that were generally the case then in theory it could be replaced by zero-runtime-cost compile-time deterministic allocation mgmt (albeit with tsuris of a complex compile-time typing system). Afaics, you continue to repeat this belief system of yours (presumably confirmation biased by your presumed preference for your low-level framework you discussed in your thread “Saluations! :)” and per my comments to @keean about C/C++ below) without refuting the points that have been made to you numerous times previously. I’ll recap and augment… (and hoping you’ll keep the not-so-thoroughly-contemplated-studied noise to a minimum in this/my thread please).

https://developer.android.com/guide/components/activities/process-lifecycle.html

They have whole lot of classes to build the application on, to make resource management and lifecycle of object more deterministic.

I don’t see how the process lifecycle has anything to with the generalized problem that in general memory allocation is non-deterministic.

It impose a certain structure to all application classes and component to make memory management fit in case that the GC can easily deal with.

Programming paradigms for avoiding memory leaks are useful regardless of the memory allocation scheme chosen.

I previously mentioned the following disadvantages for reference counting to you (either in the Concurrency thread and/or the other thread you created). And note the first one below is a form of memory leak inherent in reference counting.

Reference counting (aka ARC, which technically is a form of “direct”¹ garbage collection) can’t garbage collect cyclical references. And according to an expert book I’m reading there’s no known means of weak pointers or other paradigm which can reliably solve the problem of cyclical references in all scenarios.² This makes sense because reference counting (RC) doesn’t contemplate the entire directed graph of objects holistically. Thus only “indirect, tracing”¹ garbage collection which deals with the entire directed graph will reliably detect and garbage collect cyclical references. However, note that probabilistic, localized tracing combined with RC decrements is comparable to mark-sweep (MS)³ for (even apparently asymptotic) throughput computational complexity costs, yet with an upper bound (“6 ms”) on pause times due to locality of search. Yet presumably pause times are eliminated with a concurrent MS (and as cited below, RC has less throughput than BG-MS).

According to this expert book⁴, reference counting has significantly worse throughput (except compared to the pathological asymptotic huge heap case for tracing GC which can be avoided¹⁵, ^{_{and perhaps multi-threaded/multi-core/parallelism but I think need to study that more}}) because of the significant additional overhead for every pointer assignment. This overhead as a percentage is worse for highly optimized, low-level languages (e.g. C/C++) compared to interpreted languages which have high overhead any way. Note that a claimed throughput parity for RC with MS⁵ (and it’s just culling young generation overhead similar to prior art¹⁶) is not parity with generational GC combined with MS (BG-MS).⁶ Additionally the adjustment of the reference counts on each pointer assignment thrashes the cache⁷ further reducing performance, except for optimizations which generally have to combined with a tracing GC any way ^{_{(and they diminish the afaics mostly irrelevant claimed deterministic finality benefit)}}.⁸ Additionally for multithreaded code, there is additional overhead due to contention over the race condition when updating reference counts, although there are potential amortization optimizations which are also potentially superior for parallelized and/or distributed paradigms⁹ ^{_{(but they diminish the afaics mostly irrelevant claimed deterministic finality benefit)}}. Additionally memory allocation management on the heap is non-deterministic, so the determinism of the immediacy of deallocation for referencing counting is afaics more or less irrelevant. Additionally reference counting (especially when executing destructors) can cause a dominoes cascade of latency (aka “stop the world”) and the amortization optimizations¹⁰ again ^{_{diminish the afaics mostly irrelevant claimed deterministic finality benefit}}.

However, in my next post in this thread, I will propose a variant of deferred reference counting¹¹ in conjunction with a clever generational GC scheme (apparently similar to prior art¹⁶), for the avoiding the pathological asymptotic case of tracing GC. See my further comments about this in my reply to @keean below.

Especially if you go for multi layered scope, with closures, and asynchronous code execution, it can become very hard to make sure the GC will catch everything right. And bugs will generally appear when the program start to become more complex rather than in easy vanilla cases when the program is small and simple.

In general, more complexity in the code will increase the probability of semantic memory leaks, even semantic leaks of the form that automatic garbage collection can’t fix because the error is semantic. Reference counting (a form of automated GC) will also suffer these semantic memory leaks. The issue of complexity of semantics, is more or less not peculiar to the type of automatic GC scheme chosen. Thus I don’t view this as a valid complaint against tracing GC in favor of reference counting.

¹ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §1 Introduction, pg. 2.}

² _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.5 Cyclic reference counting, pg. 56 – 62, §3.6 Issues to consider: Cyclic data structures, pg. 71.}

³ _{D. F. Bacon and V. T. Rajan (2001). Concurrent cycle collection in reference counted systems. In J. L. Knudsen, editor, Proceedings of 15th European Conference on Object-Oriented Programming, ECOOP 2001, Budapest, Hungary, June 18-22, volume 2072 of Lecture Notes in Computer Science, pp. 207–235. Springer-Verlag, 2001.}

⁴ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm, pg. 19 – 25, §3 Reference Counting, pp 43 – 74.}

⁵ _{R. Shahriyar, S. M. Blackburn, X. Yang, and K. S. McKinley (2012), "Taking Off the Gloves with Reference Counting Immix," in OOPSLA ‘13: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, 2013. Originally found on Steve Blackburn’s website.}

⁶ _{Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003, Table 3 on pg. 9.}

⁷ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §2.1 The Reference Counting Algorithm: The strengths and weakness of reference counting, pg. 22.}

⁸ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.3 Limited-field reference counts, pp. 50 – 55, §3.3 Limited-field reference counts: One-bit reference counts, pg. 52, §3.6 Issues to consider: Space for reference counts & Locality of references, pg. 71.}

⁹ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §8.4 Concurrent Reference Counting, pp. 200 – 201, §3.7 Notes, pg. 74, §4.1 Comparisons with reference counting, pg. 77.}

¹⁰ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.1 Non-recursive freeing, pg. 44.}

¹¹ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting: The Deutsch-Bobrow algorithm, pg. 46.}

@keean wrote:

My preference is to manage as much as possible from the stack as you can. There are some simple tricks that can make this work. If all allocation is RAII and you have procedures not functions then to return some data you simply pass a collection (set/map/array etc) into the procedure.

This is conceptually broadly analogous/related to deferred reference counting¹² schemes which rely on compile-time deterministic tracing of the deallocation, such as via linear (aka uniqueness) types.

Although I was obviously also contemplating similarly to you in my prior posts in this thread (after you had reminded me about RAII in our previous discussions and my recollection of programming in C++ in the late 1990s), after reading a book on garbage collection and thinking more about the issues, I’m going to further argue against the compile-time deterministic memory mgmt language design strategy in my upcoming detailed post, because I believe it’s not as generally flexible, generally sound, nor highly comprehensible solution.

These stack memory mgmt paradigms you’re referring to, have narrow applicability because they don’t handle the general case of non-deterministic heap management.

Afaics, it’s a special case methodology that leads to a very complex set of corner cases such as for C/C++ which tries to pile on a bunch of different sort of compile-time optimizations and keep as much memory mgmt on the stack, but lead to a brittle clusterfuck of a language with a 1500 page manual that virtually no one understands entirely (not even the C++ creator Bjarne Stroustrup nor the STL creator Alexander Stepanov, as they both admitted).

And those stack paradigms don’t marry well holistically with a bolt-on tracing garbage collection appendage¹³ (i.e. was not design holistically with the language), although I have not yet read a more recent research paper on combining tracing GC with Rust.¹⁴ To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost of 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.

EDIT: on the coding for readability point, “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.”

Additionally, perhaps my GC design ideas (to be outlined in my next post) might possibly be compelling ^{_{(if I haven’t made some errors in my thought process, which is entirely possible given I need to eat dark chocolate to even get my brain to ephemerally coax my brain out of the 105 IQ brain fog gutter and function non-lethargically presumably due to an ongoing TB infection)}}. Note however, that the prior art for GC is apparently vast and seemingly exhaustive in terms of the low hanging fruit of obvious, simple variants¹⁶ (i.e. fertile ground presumably only in the realm of esoteric, high complexity, and/or abstruse). Thus, unlikely I discovered something (after a day of reading and contemplation) that hadn’t be considered previously in the prior art, although there appears to be potential prior art similar to my ideas¹⁶ that have appeared later than the book I read. Note I independently arrived at my ideas (after about a day of contemplating the 1996 book) and have not yet read that 2003 research paper¹⁶ to compare.

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

My thought is we need to be circumspect about the benefits versus trade-offs of every paradigm and feature we choose in a programming language. And the target use cases (and acumen/preferences of the target programmers demographics) of the programming language will also factor into the decision process.

For example, Haskell programmers favor brevity and mathematical expression with trade-offs such as hidden (2nd, 3rd, and 4th order) details in the form of implicit typing (which the mathematically minded model in their mind without reading them explicitly in the code), which drastically reduces the demographics of programmers who can read the code and participate. Some snobbishly claim this is a good filter to keep the less intelligent programmers away. Haskell has other trade-offs (necessarily to enable the mathematical modelling) which we’ve discussed else where are outside the scope of this concise summary, including but not limited to (and my descriptions may not be entirely accurate given my limited Haskell fluency) the debugging non-determinism tsuris of lazy evaluation (although the requirement for lazy evaluation is disputed), the bottom type populating all types, a total ordering (or algebra) on implementation of each data type for each typeclass interface, a total ordering on the sequential (monadic) threading of I/O processing in the code (requiring unsafe code to fully accommodate the non-determinism of I/O), etc..

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

Human managed weak references are an incomplete and human-error prone strategy. Also as you noted then you leak the non-determinism from deallocation to how to handle error conditions of null pointers, leaking the non-determinism into program logic! That is horrible. Whereas, a provably sound weak pointer algorithm has exponential asymptotics and thus is no better or worse than tracing GC mark-sweep.²

¹² _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §3.2 Deferred reference counting, pg. 45.}

¹³ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, §9 Garbage Collection for C, pg. 227.}

¹⁴ _{Y. Lin, S. M. Blackburn, A. L. Hosking, and M. Norrish (2016), "Rust as a Language for High Performance GC Implementation," in Proceedings of the Sixteenth ACM SIGPLAN International Symposium on Memory Management, ISMM ‘16, Santa Barbara, CA, June 13, 2016, 2016.}

¹⁵ _{Richard Jones (1996). Garbage Collection Algorithms For Automatic Dynamic Memory Management, Preface: The audience, pg. xxiv, Preface: The Bibliography and the World-Wide Web, pg. xxv.}

¹⁶ _{Stephen Blackburn; Kathryn McKinley (2003). "Ulterior Reference Counting: Fast Garbage Collection without a Long Wait". Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications. OOPSLA 2003. I originally found this on Wikipedia’s page for Reference Counting.}

_{Kindle version of the book is available on Amazon for less then $20 if you don’t want to turn your monitor sideways to read the “free” (copyright violation theft) pdf linked above.}

EDIT: a more concise overview resource:

@NodixBlockchain wrote:

https://openresearch-repository.anu.edu.au/bitstream/1885/9053/1/02WholeThesis_Garner.pdf

This papper cover it all :D

Answer 8 · 2020-06-26T20:38:58.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346114141
Original Date: Nov 21, 2017, 12:14 PM CST

The claims made about GC efficiency in those kind of books never turn out to be true in reality. Look at how JavaScript is slow yet emscripten can get within factor of 2 of native 'C', yet both run on the same VM. The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

Answer 9 · 2020-06-26T20:39:00.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346116255
Original Date: Nov 21, 2017, 12:21 PM CST

Edit: where this gets complicated is you then end up with two kinds of pointers "owning pointers" which must form a Directed-Acyclic-Graph, and non-owning (or reference) pointers (and links between data-structures). The simple solution is to make all 'references' weak-references, so that you have to null-test them when dereferencing to check if the resource they point to has gone away. Then we simply delete things when they go out of scope, removing all resources pointed to by owning-pointers in the DAG. No GC no Reference counting, and a simple non-zero check on dereference. Of cource 'nullable' pointers are not considered a good thing, so we can then look for static techniques where we do not allow references to outlive the object to they reference. This is where Rust's lifetimes come from.

I wonder if this solution would work if the object instance referenced can change during the lifetime of the reference.

Because it would mean you might need to delete the referenced object before the reference get out of scope in case a new object is assigned to it during its lifetime.

Answer 10 · 2020-06-26T20:39:03.000Z

Original Author: @barrkel
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346119442
Original Date: Nov 21, 2017, 12:33 PM CST

You can see the efficiency of most GCs really easily: just look at % time in GC. For programs designed with generational GC in mind, it won't be above 10% and will more likely be < 2%. GC isn't the culprit for Javascript running slower than webasm. Types are part of it, but it's more about abstraction level. Higher abstraction means the runtime needs to perform more work to get to machine code, and will be using generalisations (aka baked in assumptions about trade-offs) to get there - and on a JIT time budget too. Whereas webasm is designed to be efficiently and unambiguously converted to machine code, so there will be fewer trade-offs, and the compiler targeting webasm can spend much more time analysing the trade-offs involved in reducing abstraction level.

I don't think there's any theory that JS would ever be as fast as webasm
under any reasonable set of assumptions because of the time budgets need
to be taken into consideration. JIT means deadlines.

And that's without getting into the costs of semantic fidelity. You can't
simply pretend JS's prototype inheritance doesn't exist given typescript
source; all the work needs to be done anyhow, just in case, since the web
is a distributed system and thus non-deterministic.

…

On Tue, 21 Nov 2017, 18:14 Keean Schupke, ***@***.***> wrote: The claims made about GC efficiency in those kind of books never turn out to be true in reality. Look at how JavaScript is slow yet emscripten can get within factor of 2 of native 'C', yet both run on the same VM. The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#35 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHypPkm1PWue63YbgWl4_-23tAvY5g4ks5s4xLugaJpZM4OZwjF>
.

Answer 11 · 2020-06-26T20:39:05.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346122168
Original Date: Nov 21, 2017, 12:43 PM CST

TypeScript can force you to stick to classes for objects, allowing the JIT to efficiently optimise these. Really for JS where there are no allocations going on it is as fast as asm.js. GC hides all sorts of costs, like the extra level of pointer indirection.

Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.

What is the fastest garbage collected language?

Answer 12 · 2020-06-26T20:39:07.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346126086
Original Date: Nov 21, 2017, 12:56 PM CST

GC is even worse with multi threads, because it has to stop all the threads to do the collection. This leads to GC pauses, and bad user interface Jank, plus programs crashing due to fragmentation...

Answer 13 · 2020-06-26T20:39:10.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346126913
Original Date: Nov 21, 2017, 12:59 PM CST

The cache issue for me i don't think it's a problem as pointer and reference count are supposed to always be in the same cache line

The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.

Answer 14 · 2020-06-26T20:39:14.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346128500
Original Date: Nov 21, 2017, 1:05 PM CST

See: https://softwareengineering.stackexchange.com/questions/203745/demonstration-of-garbage-collection-being-faster-than-manual-memory-management

The key fact for me is that managed memory usage is always significantly worse than the managed. Yes GC can be fast providing you have twice the RAM available. Typically on a multi-tasking machine this means less RAM available for other processes. This results is real slowdown of things like the disk-caching layer because its losing pages to the application.

The end result is that the user experience of using a system with GC is significantly worse than using a system with manual allocation - providing the software does not crash due to bad memory management :-)

Answer 15 · 2020-06-26T20:39:16.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346131548
Original Date: Nov 21, 2017, 1:15 PM CST

It cannot be with the pointer, because when you create a new pointer to the array you need to increment the reference count.

Answer 16 · 2020-06-26T20:39:19.000Z

Original Author: @barrkel
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346132506
Original Date: Nov 21, 2017, 1:19 PM CST

The argument in favour of GC isn't speed of execution, it's speed of development at very low costs - an extra 2 to 10% or so. Speed of development in memory safe languages is significantly higher; and if you don't like GC, GC isn't the only way to get to memory safety, but it is the easier way.

Reference counting is GC. Arguably, arena allocation, or any other scheme
more sophisticated than paired alloc & free is GC. But those schemes are
usually chosen not because they perform better (though they often do -
free() isn't "free") - but because developer productivity is higher. And
none of those schemes are what make something like JS slower than something
like webasm.

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

I'm gonna unsubscribe from this thread now. I got linked in by an @
reference.

…

On Tue, Nov 21, 2017 at 7:05 PM Keean Schupke ***@***.***> wrote: See: https://softwareengineering.stackexchange.com/questions/203745/demonstration-of-garbage-collection-being-faster-than-manual-memory-management

The key fact for me is that managed memory usage is always significantly
worse than the managed. Yes GC can be fast providing you have twice the RAM
available. Typically on a multi-tasking machine this means less RAM
available for other processes. This results is real slowdown of things like
the disk-caching layer because its losing pages to the application.

The end result is that the user experience of using a system with GC is
significantly worse than using a system with manual allocation - providing
the software does not crash due to bad memory management :-)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#35 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHypDRVXNDtR_IQ1pw9KW3-aGN6LIWjks5s4x7tgaJpZM4OZwjF>
.

Answer 17 · 2020-06-26T20:39:21.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346135291
Original Date: Nov 21, 2017, 1:29 PM CST

I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

Actually the difference in performance between JS and WebASM is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...

Answer 18 · 2020-06-26T20:39:23.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346138363
Original Date: Nov 21, 2017, 1:40 PM CST

Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.

Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.

Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.

Answer 19 · 2020-06-26T20:39:26.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346186551
Original Date: Nov 21, 2017, 4:47 PM CST

@NodixBlockchain wrote:

What make me think that cyclic dependency are not a problem with the way i handle it, is that i have a case of cyclic dependency like this.

This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.

I understand you want to participate. But you really have not wrapped your mind around all the concepts, and so you should be more circumspect and lurk more. I said if you want to post to your thread, then I will try to participate occasionally.

Please be respectful and recognize your limitations.

I'm aware of the potential downside, but for me they remain benign for the use case i have for the moment.

After briefly learning about it, I’m no longer interested in your low-level framework. Your monotheistic intent on pushing your low-level preferences diverges the focus of my thread, because you don’t have an open-mind attempting to be objective about research results. Your low-level framework focus is irrelevant to the topic of this thread. Please talk about that in your thread which you entitled “Saluations! :)”. This/my thread is not for trying to convince yourself and/or the world that your low-level framework paradigm is great. Your thread is for that. I wouldn’t ban you even if this were my repository, but I believe curation (not absolute censorship) is valuable. Trust me, if you write anything in your thread which I happen to read, that I think is important enough to be in this thread, I will quote it over in this thread. Also I presented a guideline given that I find your posts have so many errors and low informational value, that you could post after every 10 posts by others in the thread (so as to force you to focus yourself more and keep the S/N ratio here reasonable).

I considered the low-level implications as even more evident by the post which follows this one. And realized that your preferences are a dinosaur. See my next post.

I am discussing here about general purpose, high-level programming language design that employs compiler checking and/or automated runtime GC to facilitate lower costs of development and guard against human error. With some thought given to performance optimization and some perhaps C-like low-level capabilities as long as the result is clean, simple, and elegant and not explicit, manual, bugs-prone human mgmt of memory.

Answer 20 · 2020-06-26T20:39:29.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346199127
Original Date: Nov 21, 2017, 5:53 PM CST

@keean wrote:

The difference is that asm.js and WebASM are strongly typed and manually manage the memory. In theory the JIT can make well typed JS code as fast as strongly typed code, so well typed TypeScript should be as fast as the compiled asm.js/WebASM, but its not. The remaining difference is mostly down to memory management in my opinion.

TypeScript transpiles to JavaScript (ES6), not ASM.js nor WASM.

(As you know, TypeScript aims to be round-trip compatible with and a typed superset of JavaScript— meaning it should pass through ES6 code more or less unchanged.)

Also C code which is transpiled to ASM.js/WASM, is employing different abstractions such as not boxing every damn thing, such as every element of every array.

The VM can’t always statically type objects, so methods and type coersions can require dynamic overhead.

Also I believe JS may be passing all function arguments on the heap (the prototype chain of objects which form the closures we previously discussed), and thus although the throughput and locality (i.e. contiguity for minimizing cache and virtual paging thrashing) is similar ^{_{(i.e. allocation contiguously on the stack versus the generational heap with only an extra end of area pointer increment for each generational area allocation as compared to one for the SP on each function call)}}, the volume of objects before a collection ^{_{(i.e. reduction of the end of area pointer, although presumably all function arguments could be allocated as one data structure thus equivalent cost)}} is much greater than for stack passed arguments and thus perhaps additional thrashing.

C code rarely employs hashmap lookup, yet JS literally encourages it since all object properties and non-numeric array indices employ it.

There are many more details and we would need to study all the reasons:

https://news.ycombinator.com/item?id=7943303
https://www.html5rocks.com/en/tutorials/speed/v8/
https://blog.sessionstack.com/how-javascript-works-inside-the-v8-engine-5-tips-on-how-to-write-optimized-code-ac089e62b12e

This thread is in part my attempt to get the best of low-level ASM.js/WASM married to a better GC scheme that is more compatible with (some of) those lower-level abstractions!

[…] GC hides all sorts of costs, like the extra level of pointer indirection.

Probably a major factor is that GC hides the cost of allocation and deallocation, so the programmer does not realise they are thrashing object creation.

Agreed, e.g. do not realize they’re boxing everything forcing an extra pointer indirection for every read/write of an array element for example.

Actually the difference in performance between JS and ~~WebASM~~[ASM.js] is only about 10-20% anyway, so that is in the 2%-10% region you claimed for the GC, so for some algorithms it could account for 100% of the difference...

Huh? JS is several times slower than C, and ASM.js is afair purportedly within about 40% slower than native C speed:

https://julialang.org/benchmarks/
https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=node&lang2=gpp

Perhaps you’re referring to JS highly optimized to avoid features which impair aforelinked V8’s optimization?

Yes the choice is between static or runtime checking. The best is to use the stack, so there is no need for checking, but that does not work for all scenarios.

Generational GC is very efficient nearly as much as the stack (as I explained) and I am going to propose an algorithm that I am thinking might make it even more efficient.

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote¹⁶ in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

Also there is research¹ (which I haven’t read yet) which seems to indicate that as the number of (perhaps also non-deterministic) factors (e.g. power and energy management issues) that need to be managed increases, then automatic algorithms may become the most plausible (reasonable cost, time-to-market, stability, maintenance) for most applications with hand-tuning being further, further relegated to the esoteric that demand extreme tuning (e.g. the NASA space shuttle).

Yes GC can be fast providing you have twice the RAM available.

Even manual memory mgmt will have virtual memory paging out to disk as the physical memory is exhausted.

That half memory deal is afaik due to the “semi-space” of a copy collector that is not generational. Afaics, there are better alternatives now. See footnote¹⁶ in my prior post. I need to study this more thoroughly though.

Of course manual (both unchecked and compiler checked) mgmt can be a bit faster or correct some pathological cases (as well create potentially other ones ;), but at great cost as I wrote in my prior post as follows:

_{I apply the 80/20 or 90/10 Pareto principle to prioritization. Although all that special case tsuris might be able to obtain an extra 10% of performance, it has the cost 90% more effort on debugging, readability of the code, maintenance of the code, complexity of the code, etc.. When you really need that 10% boost (or in the case where the more generalized GC paradigm is much slower for some reason), then go write a module in C++ or Rust. I see no need to invest my effort in trying to recreate (or improve upon) those languages, because there are millions of man-hours already in those ecosystems. Our only prayer of creating something useful in this decade with our resources, is to 90/10% prioritize on clean abstractions that are elegant and generalized and 90% performant with 10% of the effort and complexity.}

And you alluded to the great cost in terms of bugs and/or tsuris:

using a system with manual allocation - providing the software does not crash due to bad memory management :-)

In my prior post, I had alluded to how attempting to have the compiler check memory mgmt is forcing it into a deterministic, total order, which will necessarily cause the programmer to violate the compiler checks and leads to bugs ^{_{(unless perhaps that compile-time paradigm is somehow limited to only deterministic instances, but from my study thus far it appears that the cross-pollution/interopt between young and older generation allocation makes non-deterministic, runtime automated algorithm interop with a deterministic young generation have worse characteristics than a runtime, automated generational non-deterministic collector for the younger generation)}}:

To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

[…]

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

¹ _{T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in ISCA ‘12: The 39th International Symposium on Computer Architecture, 2012. Originally found on Steve Blackburn’s website. Also T. Cao’s master’s thesis prior art.}

@barrkel wrote:

The argument in favour of GC isn't speed of execution, it's speed of
development at very low costs - an extra 2 to 10% or so. Speed of
development in memory safe languages is significantly higher

+1.

Cited:

Hertz and Berger [2005] explore the tradeoff between automatic and manual memory management in an artificial setting, using a memory trace from a previous run of the benchmark to determine when objects become unreachable and inserting explicit calls to free objects at the moment they become unreachable. This analysis, while interesting in itself, ignores the enormous programming effort and overhead (as in talloc above) required to track the ownership of objects and determine when they are no longer used. The most frequently cited study is Rovner [1985], who found that Mesa programmers spent 40% of their time implementing memory management procedures and finding errors related to explicit storage management.

Also it is about consistency of memory mgmt, i.e. avoiding the pitfalls of the average programmer or rushed development team.

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.

I mostly just butted in because accusing GC for JS / Typescript being
slower than webasm seemed a bit outlandish. GC overhead is directly
measurable; it's trivially disproved for anyone inclined to measure.

I'm gonna unsubscribe from this thread now.

Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.

I appreciate your assistance in helping to explain. And I agree that is an important clarification to emphasize. Please feel free to participate here as often or infrequently as you want.

_{Btw, @keean is expert in other areas such as C/C++, Haskell, Prolog, type theory, and library algebras. Afaik, he has only been developing seriously with JS/TypeScript for a relatively much shorter period of time. So that might explain it. Or it could also be rushing and trying to answer late in the evening after a long day of managing his company.}

@barrkel wrote:

Types are part of it, but it's more about
abstraction level. Higher abstraction means the runtime needs to perform
more work to get to machine code, and will be using generalisations (aka
baked in assumptions about trade-offs) to get there - and on a JIT time
budget too. Whereas webasm is designed to be efficiently and unambiguously
converted to machine code, so there will be fewer trade-offs, and the
compiler targeting webasm can spend much more time analysing the trade-offs
involved in reducing abstraction level.

[…}

And that's without getting into the costs of semantic fidelity. You can't
simply pretend JS's prototype inheritance doesn't exist given typescript
source; all the work needs to be done anyhow, just in case, since the web
is a distributed system and thus non-deterministic.

+1.

Answer 21 · 2020-06-26T20:39:32.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346301603
Original Date: Nov 22, 2017, 4:01 AM CST

I’m wondering whether that is a superstitious, anecdotal belief (i.e. aliasing error) or an assertion you actually studied, measured, and verified to be the case with in depth effort for comparing reference counting and tracing garbage collection. Because inherently reference counting doesn’t track the directed graph of memory allocation (i.e. all we have are reference counts and no back pointers to the references that comprise those counts); thus, there’s no feasible way to see which data structures have leaked because they’re no longer reachable from the untraced roots of the said graph in a reference counting GC system— for example due to cyclical references. Whereas, in theory a tracing garbage collector could provide such a graph for debugging.

If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.

It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.

The script system is actually close to javascript with tree of objects on several layers so i could make generational GC very easily, but there is too much corner case with multi threads and plugin for me to really consider it, over simple reference counting who works for every single case possible with very small code base.

Been doing tons of experiment with different model of memory management.

And i also have many experience with GC language such as AS3, android SDK and javascript, and there are always many instances where i wish there was a manual memory management, because the garbage collector won't really release the memory when it 'should', and it's very often macro managed with the GC only releasing things once in a while, which is far from being optimal if there is need for lot of dynamic object creation inside of a component that need to run for long time allocating lot of new object.

And it's not even multi threaded, and there are already these kind of problem that are recurrent in every GC language out there.

Answer 22 · 2020-06-26T20:39:35.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346319285
Original Date: Nov 22, 2017, 5:09 AM CST

I’m trying to preempt another 1000+ post thread. We already had extensive discussions of your ideas, framework, perspective in the Concurrency and your “Saluations! :)” threads. We don’t need to repeat that again. I’m not trying to be dictatorial or unappreciative of discussion. I’m very interested right now in automated GC, for the reasons I’m explaining. I want to see if we can make it work optimally with the lower-level abstractions and resolve the pathological asymptotics issue.

@NodixBlockchain wrote:

If you consider reference as objects with a delete operators rather than only a pointer with a free, the destructor can release reference to childs before to release the reference.

I already told you that explicitly asking the programmer to remember to break cycles is off-topic (i.e. the destructor will not be automatically fired if there is a cyclic reference preventing the parent most object's reference count from going to 0):

This is off-topic. Explicit, unchecked (i.e. “unsafe”) manual mgmt is not what we are interested in here. The topic was about automatic (or compiler checked deterministic) mgmt. As you know, both @keean and I are significantly into compiler-assisted checking.

Although you’ve deleted (or moved to your thread?) those numerous prior off-topic posts I asked you to (thank you), you want to repeat the same points of what I percieve to be off-topic noise w.r.t. to the focus I want in this thread. I understand you think you have something important to say, but you apparently did not even pay attention or grok that I had already stated that it (i.e. the same point you’re repeating again) was/is off-topic.

@NodixBlockchain wrote:

but there is too much corner case with multi threads

We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading everywhere is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.

@NodixBlockchain wrote:

It's not anecdotal, i have fully working application server with it's own memory management, object oriented script language, which deal with thousands of objects and i never spent more than 1h on tracking memory leak in the whole development cycle because i have clear methodology to track them.

How many times am I going to have to repeat myself that I am not interested in discussing further the anecdotal claims of your low-level framework. You’re apparently overlooking the more holistic point that:

We’re (or at least I’m) not interested in your manual “unsafe” (i.e. not checked by compiler not runtime enforced) paradigm. Been there, done that in the 1980s and early 1990s. In late 1990s I adopted simplistic reference counting with C++ for CoolPage. Now I’m advancing further. IMO, you’re a dinosaur.
My reasons for not being interested are detailed in my prior response to @keean and @barrkel.

You’re preferences and experience with your framework are not indicative of the development costs and other trade-offs for the community-at-large with such a paradigm, i.e. your arguments are anecdotal. When you have a large community and a lot of research/surveys/metrics amongst many general purpose applications refuting the exposition I have made based on more extensive sources, then we can discuss yours. No doubt that if you’re personally very dedicated and expert in hand-tuning with your framework specifically designed for the blockchain application you’re building, then you can probably get outstanding metrics as an anecdotal example application, but that isn’t provably (and IMO unlikely to be) relevant in the larger scheme of things w.r.t. to general applications and general developer productivity, maintenance, stability, bugginess, etc.. Again please refer to my prior response to @keean and @barrkel for my reasoning on why manual, human-error prone management of resource factors is a computer science cul-de-sac. Whether my perspective is correct or not, afaics it’s the current focus of this thread. Even the owner of this repository @keean who expressed some preference for some compile-time deterministic memory mgmt, afaics doesn’t want to allow cyclic references memory leaks that require human intervention, as you’re apparently proposing.

Additionally having your own methodology for your own handcrafted framework is not a valid comparison to what is possible but has not yet been so handcrafted for tracing GC (so as to be able to compare apples-to-apples for a diverse ecosystem of users and use cases), as I had already explained in a prior response as follows:

In both cases, a sophisticated debugger and language could in theory be designed to set such conditional breakpoints on all pointer assignments (conditionally breaking only if the said shared object address is encountered), if such a facility is helpful for debugging memory leaks. And the tracing GC would have more information about the memory allocation graph.

If you’re interested in your paradigm in spite of what I perceive to be the convincing exposition in my prior posts, then please feel free of course to discuss in a different thread. I may occasionally read there. If you make your best arguments there (not in this thread) and if I see anything I agree with, then I will quote it over here (and invite/entice you to discuss here). Please do not pollute my thread (at least respect the one post for every 10 others request, or 6 if all the intervening posts are lengthy and infrequent) with what I currently perceive to be off-topic discussion. I do not want an interactive chat style discussion in this thread. We can chat back-and-forth realtime in private or another thread. I wanted focused, carefully thought out discussion in this thread that is easy to re-read and condense enough for others to follow. Because this is my thread wherein I am trying to sort of what language paradigms I need and this is a very serious issue I am in a rush to complete and I do not want the distraction of what I perceive to be a boisterous (repetitively arguing the same points over and over) dinosaur and entirely incorrect.

Essentially you’re trying to route me away from extensive sources to your pet project. That is very myopic.

You’re apparently a prolific and very productive programmer ^{_{(unlike my pitiful self reduced to rubble because still apparently suffering from a perhaps resistant strain of Tuberculosis)}}. You and I aren’t able to collaborate much because you and I are ostensibly headed in different directions w.r.t. programming paradigms, while maintaining very low overhead on throughput.

@NodixBlockchain wrote:

And i also have many experience with GC language such as AS3, android SDK and javascript, and there are always many instances where i wish there was a manual memory management, because the garbage collector won't really release the memory when it 'should', and it's very often macro managed with the GC only releasing things once in a while, which is far from being optimal if there is need for lot of dynamic object creation inside of a component that need to run for long time allocating lot of new object.

It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.

You’re referring to automated GC implementations from a decade or more ago. Even Google’s JavaScript V8’s GC is only incremental, generational with a standard pathological asymptotics mark-sweep for the older generation.

I wrote already in this thread that new research has drastically improved this issue:

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote¹⁶ in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

You make me repeat myself because you don’t assimilate all the details I write. Presumably you’re so glued to your mental model of programming that you don’t notice subtleties that paradigm-shift the presumptions.

@keean wrote:

The reference count needs to be in the object referenced, not the pointer, so it won't be in the same cache line as the pointer. When you have multiple pointers to the same object they all need to increment/decrement the same counter.

Agreed, but note in my prior post with the many footnotes about GC, I cited some research (Weighted Reference Counting) that places the count in the pointers for allocation and only has to decrement the shared count on deallocation. Also cited research about employing the reference counting only for a single-bit stored in the pointer in a reference counting generation scheme. Also in an earlier post I suggested to pulling all the counts closer together in contiguous locality to minimize cache line and virtual page faults. There’s also the concept of buffering the increments and decrements to amortize.

Answer 23 · 2020-06-26T20:39:39.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346323307
Original Date: Nov 22, 2017, 5:27 AM CST

The thing is my paradigm is the basics for high level language, which is fully working with JS/Json, can generate binary data for webGL apps, or other kind of dynamic application.

Even Node.js or javascript VM use this kind of code internally. You can't have any kind of high level language without this kind of code underlying it.

And for the moment it give more feature than javascript as it work in multi threaded environment and memory can be freed safely when it's not used directly.

The script language can already process HTML data, including form variable, query variable, and interact 100% with js app, and i think it's still cleaner than PHP.

for example https://github.com/NodixBlockchain/nodix/blob/master/export/web/nodix.site#L555

It can parse POST variable, including files, generate js variable and code with live variable from the node etc.

The low level mechanism is only to be seen as a minimalistic VM to run the script language, which either handle network events, or generate html page with dynamic data.

Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc

The lexer/VM can be very simple because the low level code already have advanced feature to deal with objects, memory, scope and evaluation of objects data.

Answer 24 · 2020-06-26T20:39:41.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346336143
Original Date: Nov 22, 2017, 6:27 AM CST

@NodixBlockchain, you’re repeating the same points I already discussed with you in the Concurrency and your “Saluations! :)” threads. I refuse to repeat my explanations again to you as to why what you think is applicable, is not applicable to what I am doing. It’s not my job to repeat myself over and over, just because you think I didn’t already refute you before in those other threads.

Please stop. Please.

You can't have any kind of high level language without this kind of code underlying it.

Any kind of advanced high level language will need this sort of code to manage memory, references, multi threads etc

Incorrect. And I do not want to explain why again.

And for the moment it give more feature than javascript as it work in multi threaded environment

Your paradigm for multithreading is not applicable to the direction I’m headed, as I will repeat what I wrote before quoted as follows:

but there is too much corner case with multi threads

We already discussed in the prior threads that (your and Rust’s paradigm of allowing/presuming) willy-nilly multithreading every where is not the paradigm I’m interested in. I’m looking at the single-threaded event model. I will be discussing GC and multithreading in this thread.

I’m not trying to be a jerk. I have clear reasons for the focus that I’m pursuing. I hope you also understand my available time is very limited (and no that is not a valid reason that I should employ your framework).

^{_{Sometimes I ponder if the trolls at BCT paid you to come and disrupt me. I’m having enough difficulty as it is making forward progress, and I need to be taken off on repetitive tangents like I need another hole in my head in addition to the one I already have from a hammer.}}

Answer 25 · 2020-06-26T20:39:43.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346367257
Original Date: Nov 22, 2017, 8:34 AM CST

@keean wrote:

Interestingly I am about to port a bunch of asm.js to typescript, as the memory allocation for asm.js / webasm does not work well with the main JS garbage collector, and allocating the heap for the webasm is causing fragmentation of the JS heap. In the end it turns out that the better memory usage, keeping all them memory managed by the GC is better for the application stability than the small performance gain of having the code in asm.js.

Edit: its probably worth pointing out that this application uses a lot of memory, and has some periods when it is performing rapid allocations. This leads to memory usage for a single tab of about 1GB (even though memory buffers are 16mb so with manual allocation it would get nowhere near that level). Frequently when usage gets more than 1GB the web-browser crashes, even though most of that memory is probably free, and just waiting to be collected.

Agreed afaik you appear to be making a valid and important point, which dovetails with my initial peek into analyses about GC for proper integration with the low-level coding.

The problem you’re experiencing perhaps has something to do with the fact that most browsers are limited to presuming a least common denominator of a 32-bit virtual address space and this gets very complicated as to how they split this up between processes. They also have to protect against writing outside of bounds efficiently. I dug a little bit into the issues (c.f. the last 2 links in quoted text below) but did not completely climb down the rabbit hole.

@shelby3 wrote:

For this last quoted issue, I am contemplating the ability to employ the unused bits (c.f. also) of 64-bit pointers (64-bit also provides a huge virtual memory space and c.f. also) would enable storing an index into a contiguous array which has the reference counts as array elements so they would be in contiguous pages.

I had mentioned to you in private discussion my idea about storing older generation objects in same sized objects memory pages in order to minimize fragmentation with a 64-bit virtual address space. This would rely on less frequent compaction pauses for old generation objects, and would reuse old generation slots instead of waiting to compact them (i.e. not primarily a copy collector).

Also note that very large or small objects are typically handled different by GC algorithms, so there may be an issue there w.r.t. to the 16MB objects you’re allocating.

I agree that the memory allocation seems to have corner cases around the interaction of the GC in JS and the allocation of the ArrayBuffer for ASM.js/WASM. Also of course objects have to copied from one to the other, they’re not interoperable.

However, as I pointed out in prior posts, just because automated GC has a corner issue with a particular scenario, doesn’t enable the presumption that manual memory mgmt is a panacea in terms of the holistic equation of memory mgmt optimization. You gain some control in one area and lose something in other axis of the optimization space (e.g. more human-error prone, less throughput, or worse asymptotics, etc).

@shelby wrote:

It “should” only do so in a balance of throughput, asymptotics, pause latency, and (time & space) locality. If you’re preferring priorities of time locality (immediacy), then not only are you attempting the impossible of defying the non-determinism of memory mgmt (i.e. immediacy is irrelevant when the period of allocation is non-deterministic), you’re also not optimizing the entire equation for memory mgmt. This sort of myopia might not manifest as measured problems in your handcrafted project due to the confirmation bias of handcrafted design decisions, but may in more diverse and general ecosystem of use cases and developers. Your exposition has the hallmarks of an amateur lacking academic study.

@shelby3 wrote:

Note though that the metric you propose (although significantly representative given all other factors equivalent) doesn’t account for all impacts of the GC scheme chosen, such as cache and virtual paging locality (contiguity) to minimize thrashing which can significantly impact performance. Note however, that deterministic compile-time (manual explicit or for stack argument conventions and such thus assisted partially implicit) memory mgmt doesn’t necessarily achieve a panacea for this ancillary factor and could actually degrade it because generational and copy GC increase locality.

I’m contemplating whether I can design my own programming language which models what I want, will tolerably to transpile to JS/TypeScript (or something else) easily enough to allow me to use it now, and then rewrite the VM and compiler later to optimum. This is what I’m analysing now and the reason I’m doing research on GC.

@keean wrote:

Following this argument to its logical conclusion, the better memory usage of manually managed memory for the whole application would be better overall.

How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?

Afaics, the optimum solution is to better design the GC and VM with the low-level coding in mind. JavaScript was designed by Brendan Eich in 10 days. I presume that since then backward compatibility demands have prevented any holistic overhaul.

Also I’m prepared to deprecate 32-bit OSes as a viable target. I’m future directed because anyway, we’re not going to get something popularized within a couple of years at best. Afaik, even mobile is moving towards 64-bit rapidly. Microsoft Windoze may be a bit resistant, but 64-bit Home is available for those who care when they buy a machine. And anyone still running that spyware without dual booting or running Linux in a VirtualBox is really behind the curve.

Answer 26 · 2020-06-26T20:39:47.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346379971
Original Date: Nov 22, 2017, 9:15 AM CST

Sometimes I ponder if the trolls at BCT paid you to come and disrupt me.

Just have to comment on this, you're paranoiac dude lol I don't have anything to do with anyone on BCT lol

Answer 27 · 2020-06-26T20:39:49.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346390097
Original Date: Nov 22, 2017, 9:47 AM CST

How do you arrive at this conclusion? Have the posts I have made since you wrote that, changed your mind?

Because I am good enough to get the manual memory management right, and it will use a lot less memory, which will result in less fragmentation, and less crashes and therefore a better user experience.

I am still convinced that static analysis and region based memory management is a sweet spot for high performance languages.

I think GC is useful for simple coding, but there are some big problems:

it encourages object thrashing
it suffers from fragmentation
it causes user-interface jank due to the 'stop-the-world' nature of GC.

If you have a system with all three of the above problems, there is nothing you can do as a programmer within the language to solve the problem.

You can however solve problems (2) and (3) if you can avoid (1) by limiting object creation. If you can force pre-allocation of resources, and then use effects to restrict new object creation, so that the compiler will prevent you using idioms that thrash the GC then it could be acceptable. The problem is that you need all the library APIs to respect this too, and provide a way for the user to pass the result buffer into the function/procedure. As we know if you give people multiple ways to solve a problem they will probably pick the wrong way.

This is why I am interested in a language that forces people to do things the right way, so that then library APIs will always be written the correct way. Of course this might not make the language very popular, but I think it is an interesting approach to explore.

Answer 28 · 2020-06-26T20:39:53.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346426095
Original Date: Nov 22, 2017, 11:49 AM CST

@NodixBlockchain wrote:

^{_{Sometimes I ponder if the trolls at BCT paid you to come and disrupt me.}}

Just have to comment on this, you're paranoiac dude lol I don't have anything to do with anyone on BCT lol

It’s half-joke and half observing that the trolling of BCT seems to follow me over here. However, it must be stated that you’re not insincere. You truly believe (or believed) your work is applicable to what I want (or to the balanced presentation for other readers/participants here). I just don’t know how much verbiage I have to put out in order to get you to understand my side. Seems like perhaps you understand now. Again I am very much willing to read anything you write in the thread you created on @keean’s repository, if I have time. But I hope you will understand my point in this thread. I suggest re-reading the thread, as I made so many edits to my posts over the past 8+ hours, including my prior responses to you in this thread.

Again thank you for cooperating on my request. That was very honorable. I’ll try to be mutually respectful in your thread.

@keean wrote:

Because I am good enough to get the manual memory management right, and it will use a lot less memory, which will result in less fragmentation, and less crashes and therefore a better user experience.

That appears to ignore the arguments @barrkel and I have made about the 80/20 rule of nature. Those who ignore that rule perpetually fail in business and life. Of course you can do anything with infinite programming man-hour resources (assuming it doesn’t end up in a maintenance clusterfucked hairball), but that is not a valid argument.

That appears to be the hubris of anecdotal experience not backed by a sound analysis of the factors involved: which I will revisit below since apparently my prior writings in this thread are being ignored.

No one should dictate to you which programming language you should use. Likewise, you can’t dictate to the rest of the world, what it prefers in terms of trade-offs in costs and outcomes for choice of paradigms in the programming languages. Java, JavaScript, Python, PHP, Haskell are very popular programming languages which all use tracing GC (and apparently not even the state-of-the-art GC):

@shelby3 wrote:

Even Google’s JavaScript V8’s GC is only incremental, generational with a standard pathological asymptotics mark-sweep for the older generation.

C/C++ are also popular, but afaik they do not match the aggregate market share of the others. As well, C/C++ these days is typically is limited to programs which require low-level control.

My quest in @keean’s repository has to been to get the advantage of GC and perhaps some other aspects of FP such as first-class functions and closures in a simple language that also can do some low-level things such as unboxed data structures. To further erode the use cases where C/C++ would be chosen, to relegate that clusterfucked C++ language tsuris (and perhaps Rust also but the jury is still out on that) more and more to the extreme fringe of use cases. For example, It is ridiculous that I must employ a cumbersome Node.js Buffer library to build an unboxed data structure.

Also I think you may be underestimating the trade-offs (at many different levels such as development time, maintenance, worst case failure modes, scalability, etc) of avoiding the assistance of runtime algorithms. Essentially since non-deterministic resource mgmt is an invariant of programming, you’ll end up writing a runtime GC algorithm to handle it, so I can’t see how your stance makes any sense whatsoever:

@shelby3 wrote:

Yes the choice is between static or runtime checking. The best is to use the stack, so there is no need for checking, but that does not work for all scenarios.

Generational GC is very efficient nearly as much as the stack (as I explained) and I am going to propose an algorithm that I am thinking might make it even more efficient.

What he fails to realise is that 2-10% is only true when you have enough memory.

It is true that the asymptotic computational complexity (i.e. performance) of tracing GC is pathological both as live memory usage approaches/exceeds the bound of physical memory and as total memory consumption increases for computers with more GBs (or TBs) of DRAM. That is why I cited that research for footnote¹⁶ in my prior post—which btw my independently derived idea seems to be roughly analogous to—for removing the pathological asymptotics by leveraging an optimized form of RC for the older generations. I will write my detailed post about this soon.

And that is why I allude in my response to @barrkel to the time vs. space trade-off approaching parity between automated GC and manual/stack/compile-time deterministic paradigms for memory management. The reason that the deterministic paradigms you espouse will not exceed the space vs. time trade-offs for the throughput (nor maximum latency pauses) of state-of-the-art automated GC in general is because I repeat that memory allocation mgmt is inherently non-deterministic (thus requires runtime management)!

Also there is research¹ (which I haven’t read yet) which seems to indicate that as the number of (perhaps also non-deterministic) factors (e.g. power and energy management issues) that need to be managed increases, then automatic algorithms may become the most plausible (reasonable cost, time-to-market, stability, maintenance) for most applications with hand-tuning being further, further relegated to the esoteric that demand extreme tuning (e.g. the NASA space shuttle).

[…]

¹ _{T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in ISCA ‘12: The 39th International Symposium on Computer Architecture, 2012. Originally found on Steve Blackburn’s website. Also T. Cao’s master’s thesis prior art.}

@barrkel wrote:

The argument in favour of GC isn't speed of execution, it's speed of
development at very low costs - an extra 2 to 10% or so. Speed of
development in memory safe languages is significantly higher

+1.

Also it is about consistency of memory mgmt, i.e. avoiding the pitfalls of the average programmer or rushed development team.

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.

In my prior post, I had alluded to how attempting to have the compiler check memory mgmt is forcing it into a deterministic, total order, which will necessarily cause the programmer to violate the compiler checks and leads to bugs ^{_{(unless perhaps that compile-time paradigm is somehow limited to only deterministic instances, but from my study thus far it appears that the cross-pollution/interopt between young and older generation allocation makes non-deterministic, runtime automated algorithm interop with a deterministic young generation have worse characteristics than a runtime, automated generational non-deterministic collector for the younger generation)}}:

To even attempt to apply these compile-time deterministic memory management holistically requires a complex typing tsuris akin to Rust’s total ordering of borrow lifetimes with lifetime annotation parameters for transitivity of borrows from inputs to outputs, and of course the programmer must violate the invariants and create unsafe code in order to handle non-deterministic (i.e. runtime randomized) memory mgmt, which defeats the entire point as then the total order is no longer checked and the unsafety can leak back into the code the compiler thinks is safe.

[…]

Total ordering compile-time paradigms (including attempting to type check everything) are afaics brittle paradigms that break because the programmer must necessarily break out of them because the entropy of the universe is not bounded. I commented on this previously numerous times in these issues threads. The non-determinism (i.e. unbounded randomness and entropy) of the universe (i.e. the real world in I/O) can’t be statically modeled.

@keean wrote:

I am still convinced that static analysis and region based memory management is a sweet spot for high performance languages.

For the highest quality software (i.e. mission critical stuff) perhaps yes (although I think perhaps algorithmically managed non-determinism will become more reliable than handcrafted), but I see already from my study of the research that gradually (even suddenly) state-of-the-art GC will continuously improve and cannibalize most of it until you’re relegated to programming languages that most people have no interest in.¹ You always knew I was interested in mainstream programming languages. So if you remain glued to that stance then we're likely headed in different directions. I thought I was going to be supporting static memory allocation analysis for Lucid, until I realized that generational allocation can be as performant as compile-time stack based for the frequent young objects (and I have a new algorithm in mind to make it more so by eliminating most of the copying, i.e. “sudden” may be about to happen in a few hours).

Btw, I’m known for writing prophetic predictions. There’s some hubris back at ya.

I think GC is useful for simple coding, but there are some big problems:

it encourages object thrashing
it suffers from fragmentation
it causes user-interface jank due to the 'stop-the-world' nature of GC.

This has all been solved as of today. I will be writing down the details shortly.

This is why I am interested in a language that forces people to do things the right way, so that then library APIs will always be written the correct way. Of course this might not make the language very popular, but I think it is an interesting approach to explore.

You mean @keean’s way. I will prove to you that is not the right way w.r.t. to this issue, although I do believe you have many important insights into programming.

¹ _{However, I think it may be correct to allow some form of manual control over memory allocation for programmers to work around corner case issues with the automated GC.}

Answer 29 · 2020-06-26T20:39:56.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346441236
Original Date: Nov 22, 2017, 12:48 PM CST

If we are targeting JavaScript, this is irrelevant, as we have to live with the web-browsers GC. Mozilla's work with their Rust browser has shown you get better performance by bringing the DOM into the JavaScript GC, rather than having it reference counted.

I think if you have GC the best model is to have everything managed by the GC, otherwise there seem to be more corner cases. For example mixing JavaScript and WebAsm seems to bad for memory performance.

So either we compile to web-assembly and manage the memory ourselves, or we should use the web-browsers GC. If we have any interaction with GC manages systems we are probably better off using the web-browsers GC.

However when compiling to native, it would be nice to have a static analysis so you don't need a runtime for the binary. I would really like it if a program to add two numbers consisted of a couple of loads and add and a store and nothing else.

As stated above, the benefit of GC is easy development, and freedom from some errors. Unfortunately people don't seem to realise you can still leak memory with GC, it eliminates some simple error cases, but you can still write programs that leak memory like a sieve until they crash.

In my experience it is much harder to debug memory leaks with GC than it is with manual allocation, because it is invisible to the programmer. So GC makes some easy coding easier, and some hard coding harder, but the easy cases are common and the hard ones rarer.

Answer 30 · 2020-06-26T20:39:59.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346452322
Original Date: Nov 22, 2017, 1:33 PM CST

@keean wrote:

If we are targeting JavaScript, this is irrelevant, as we have to live with the web-browsers GC.

I’m considering abandoning JS because of its crappy GC which doesn’t interop with low-level programming and appears to have poor asymptotics and some corner cases around fragmentation. Working on the detailed analyses now.

Originally I was advocating targeting JS because I thought it could become the mainstream VM, but there appears to be severe fundamental flaws, which is ostensibly why WASM has appeared on the scene. Java’s VM is also flawed, such as the lack of native support for unsigned data types.

I think if you have GC the best model is to have everything managed by the GC, otherwise there seem to be more corner cases. For example mixing JavaScript and WebAsm seems to bad for memory performance.

That seems to agree with what I’ve learned thus far.

So either we compile to web-assembly and manage the memory ourselves

This is the direction I’m more strongly considering now based on the aforementioned realization. But I would want to use an automated GC running on top of either WASM or perhaps native.

If we have any interaction with GC manages systems we are probably better off using the web-browsers GC.

No. JS’s GC sucks (doesn’t interop with low-level and isn’t even a state-of-the-art design). Must be replaced.

Web browsers are dinosaurs being disintermediated by apps any way. Opening the door to replace JS with WASM as the universal VM, unless WASM is poorly designed as well, in which something else might rise up. WASM apparently doesn’t support 64-bit virtual address space yet.

However when compiling to native, it would be nice to have a static analysis so you don't need a runtime for the binary.

That has no relevance in the mass market use cases I’m targeting. I’m not targeting for example the most stringent, realtime, embedded market that Rust/C++ is.

As stated above, the benefit of GC is easy development, and freedom from some errors.

And avoiding maintenance hairballs, and many other things I have already explained. You’re understating the massive, pervasive benefits.

Unfortunately people don't seem to realise you can still leak memory with GC, it eliminates some simple error cases, but you can still write programs that leak memory like a sieve until they crash.

Irrelevant. “Programming can have bugs” is not an argument.

GC addresses a fundamental and massive problem in programming, notwithstanding that it does not eliminate all programming bugs.

In my experience it is much harder to debug memory leaks with GC than it is with manual allocation, because it is invisible to the programmer.

Create better analysis tools then. Refer to the response I already made to @NodixBlockchain on that issue. I don’t think your assumption will remain true. The state-of-the-art GC algorithm employs RC for the older generation objects any way. Everything can be analysed that you do with manual now, just need the right tools.

It’s a confirmation bias to find excuses to argue for an inferior programming paradigm.

EDIT: remember we have a complexity budget. I would rather expend that static typing budget on more useful typing than wasting it on what a state-of-the-art GC can handle well. TypeScript demonstrates that a modicum of static typing can be very popular. Simplification is nearly always a win for popularity.

Answer 31 · 2020-06-26T20:40:02.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346458297
Original Date: Nov 22, 2017, 1:59 PM CST

I think you missed my point, I was suggesting exactly the same program could be compiled for a GC target or a statically managed target. There is a clear subset where this is easy to to (stack based RAII), and if you change target and something cannot be solved without GC then you would get a compile time error.

Answer 32 · 2020-06-26T20:40:04.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346482439
Original Date: Nov 22, 2017, 3:49 PM CST

@keean wrote:

GC is even worse with multi threads, because it has to stop all the threads to do the collection. This leads to GC pauses, and bad user interface Jank, plus programs crashing due to fragmentation...

Quick brain dump… (very sleepy 6am no sleep yet, actually I was laying in bed about to sleep and realized I had the answer for this)

It’s understandable why you might think this, but in the context of the new BG-RC design which that 2003 paper describes and which my independently conceived ideas were very similar to and I have some additional innovations I can propose to add to theirs, I realize that afaics multithreading can be accommodated very efficiently if we are adopting the significant advantages of JavaScript’s single-threaded event queue model, i.e. each thread is more or less isolated except for read-only shared object accesses (not copying/messages as forced for WebWorkers intercommunication). (which is why I was admonishing @NodixBlockchain that his multithreading everywhere did not apply)

JS’s current BG-MS garbage collector has huge asymptotic pause times for the MS collection of the old generation objects, thus they must not allow even read-only shared references between old generations.

But with the very low pause times of BG-RC because most of the old generation is RC due to very low mutation rates, thus this is the read-only shared objects that have very low contention because they are rarely collected and collected incrementally as they occur.

The BG (generational young generation) portion runs separately in each thread and the non-RC portion of the old generation (which is a small minority of the old generation for highly mutated objects) is separate for each thread.

So all threads STW (stop-the-world) for the very low pause times and very low contention as they coordinate their very minimal coalesced updates from each thread’s BG to the shared RC.

It's all about clever amortization and segregation with a hybrid young generational tracing and old RC (with a modicum of MS for the highly mutated old) algorithm.

BG-RC changes everything. I have no idea why Java 9’s new GC1 collector is apparently not BG-RC. Perhaps constraints of the JVM (Hotspot et al) preclude it.

I will explain all of this in more detail soon. Need to get some sleep.

JS’s BG-MS is analogous to the old parallel GC for Java, although JS’s adds some incremental/concurrent improvement.

Any way, I am targeting a model which we can base a language syntax and semantics on. Probably will transpile to JS initially for MVP prototype to get rolling with. I will not be coding a BG-RC garbage collector immediately. Perfecting a GC implementation is a massive undertaking in itself.

I wrote:

@NodixBlockchain wrote:

It need to be multithread, and as lock less as possible.

I’m going to repeat what I explained you to multiple times in the Concurrency thread. Multithreaded contention is a can-of-worms in numerous facets such as requires a deterministic total ordering on borrowing (because non-deterministic locking will always have corner case live lock bugs which are impossible to statically analyse because you can’t enumerate a priori the unbounded non-determinism in the I/O environment!) and even makes the GC less robust and slower.

Instead I’m aiming for a separate heap for each thread, and copying objects between threads. Or the alternative would be allowing the reference counted heap section (or a RC/MS hybrid such in the BG-RC design) be muiltithreaded, so objects could be shared by exclusive borrowing of references.

Answer 33 · 2020-06-26T20:40:07.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346489276
Original Date: Nov 22, 2017, 4:24 PM CST

The results are never as good in reality. See: https://goo.gl/images/vi6pce as you can see BG-RC performed worse than BG-MS, and that in turn was worse than simple mark-sweep. For general use nothing beats the mark-sweep garbage collector, but it still has all the problems we discussed. You can make the worst-cases for GC less bad, but that always makes the common-cases less good. There's no such thing as a free lunch as they say.

Answer 34 · 2020-06-26T20:40:09.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346510644
Original Date: Nov 22, 2017, 6:50 PM CST

So GC makes some easy coding easier, and some hard coding harder, but the easy cases are common and the hard ones rarer.

The pb for me is even if there is one hard case in the whole program that's enough to kill the whole thing.

In way it's similar to real life security, there are much less terrorists than normal people, but if there was only normal people , there wouldnt be a need for security at all.

For me it's same with memory, if there is only simple case, there is no need for decades of research in memory management and complex GC. But all the thinking goes for solving the hard cases, even if they are rare, there will always be some in any program, and they are the ones that will cause all the headaches.

Answer 35 · 2020-06-26T20:40:11.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346516777
Original Date: Nov 22, 2017, 7:54 PM CST

@keean wrote:

The results are never as good in reality. See: https://goo.gl/images/vi6pce as you can see BG-RC performed worse than BG-MS, and that in turn was worse than simple mark-sweep.

Why are you rushing to make statements which are not substantiated by metrics?¹ That slideshow doesn’t conclude that BG-MS outpreforms BG-RC on throughput nor maximum pause times.

All that it says that is that when heap size declines, BG-RC performance degrades faster than BG-MS. I presume the results are consistent with the graphs in the 2003 paper.

Thinking about it more, probably the relative degradation is due to the fact that the MS part of BG-MS must process the entire old generation for every mark-sweep, thus it processes less as the heap is smaller. Whereas, the computational complexity of the RC part of BG-RC is independent of the size of the heap. If I’m correct in my interpretation, then the relative degradation is simply the effect of the horrible asymptotics of the MS of BG-MS.

But of course there is no way to violate the laws of physics and to keep the maximum pause times very small even when the heap is almost full, will of course incur more throughput overhead for the concurrently running cycle collection.

The improvements in automated GC are about improving the distribution of the amortization of costs to hold maximum pause times down for all scenarios, while designing for efficiency that keeps overall throughput overhead reasonable.

Thus the real slamdunk against your thought process, is as I postulated upthread (although I do not know how to measure it to prove it, because how does one analyse “manual memory mgmt” within the context of maintenance hairball outcomes, manpower costs, costs of human errors during the process of fixing all the bugs, etc) that even “manual memory mgmt” will not be able to do significantly better (on average and in general) on the time vs. space tradeoff:

@shelby3 wrote:

@barrkel wrote:

Yes, GC scales because it trades space for time. If you're running on a
small embedded device, it can be a poor tradeoff; if you're trying to
maximize compute throughput across multiple machines, it can cost real
dollars. But these are extremes, and most people aren't on the extremes.

Well yes, but as I mentioned above in my reply to @keean, and I presume you’re also alluding to, that the differences in space vs. time tradeoffs between automatic and manual memory mgmt are getting closer to low double digit (or even single digit) percentages (i.e. “extremes” of optimization) and thus heavily favoring automatic GC for most use cases. Note I’m not conflating this metric with the 2 - 10% figure for throughput differences that you asserted.

No one can’t invalidate the (valid) laws of physics employing computer science. When the heap is full, it is full.

Of course with infinite available development resources (and presuming the complexity doesn’t get mired in a maintenance hairball), then manual coding of every detail can be perfect, i.e. digging canals with spoons is also plausible. But that is not an argument against using tools (e.g. a backhoe or automated GC), because it lacks any economic (i.e. cost analysis) context. The hyperbolic analogy is why not argue for designing your own customized integrated circuits hardware for each program you create in order to optimize the hardware for each specific use case? IOW, manual perfection requires huge economies-of-scale.

¹ _{You also made such unsubtantiated claims upthread such as the ostentatious claim that GC was responsible for virtually all the slowdown of JS relative to near-native ASM.js/WASM and implying that JS was within 20% of native, although you clarified that maybe what you meant was that GC was encouraging other design habits/choices/abstractions in JS that promote inefficient programming. Religious design preferences and superstition shouldn‘t be employed to degrade our discussions here. Clear metrics should be cited and not misrepresented, which I think was the generative essence of @barrkel’s point. If you’re intending to just make non-objective, anecdotal arguments, perhaps you can make it clear you know you’re doing so.}

@NodixBlockchain wrote:

But all the thinking goes for solving the hard cases, even if they are rare, there will always be some in any program, and they are the ones that will cause all the headaches.

But there is nothing perfect, not even manual memory mgmt. At least a reproducible automated algorithm that includes all the wisdom and profiling of a huge investment in perfecting automated GC, is going to get us as close as possible to optimum in general and on average. Of course there will always be corner cases in any programming system.

Answer 36 · 2020-06-26T20:40:14.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346547009
Original Date: Nov 23, 2017, 1:33 AM CST

Another perspective, which matches my experience of fighting the garbage collector: http://prog21.dadgum.com/134.html

Also I re-read the paper, and I did get those performance metrics backwards. If it's as good as the paper says, I wonder why nobody is using it.

Answer 37 · 2020-06-26T20:40:17.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346594564
Original Date: Nov 23, 2017, 5:32 AM CST

But there is nothing perfect, not even manual memory mgmt. At least a reproducible automated >algorithm that includes all the wisdom and profiling of a huge investment in perfecting automated GC, >is going to get us as close as possible to optimum in general and on average. Of course there will >always be corner cases in any programming system.

It's not to do my linus, but in a way there is no 'gray area', there is no easy case, hard case, and corner case, there are just cases. If the GC can't deal with all the cases properly then it's broken period.

And this article get quite to what i see with most GC language. If the program is small and simple it will work fine, but once you need to handle loads of data, have big list of potentially complex references, it still need some sort of assisted memory management to get things right.

https://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector

It's not like you can go ball out allocating millions of object with reference without a care in the world about how to optimize memory in sort that objects that are short lived can easily be detected as such, and that GC will always function in optimal manner without some form of manual optimization.

In a way GC just make the manual optimization that will always be needed somehow more tricky and more complex. But it's not like you can completely forget about memory allocation at all and only rely on the GC for everything without any care for how it will work, and some form of manual management of object lifetime and how they are represented in the GC etc.

Answer 38 · 2020-06-26T20:40:19.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346872669
Original Date: Nov 24, 2017, 11:24 AM CST

@keean wrote:

http://prog21.dadgum.com/134.html

Writing a concurrent garbage collector to handle gigabytes is a difficult engineering feat, but any student project GC will tear through a 100K heap fast enough to be worthy of a "soft real-time" label. While it should be obvious that keeping data sizes down is the first step in reducing garbage collection issues

I don’t understand how a generational garbage collector is going to have a problem with the volume of data allocations in young generations, unless it is exceeds what any form of memory mgmt could handle with non-deterministic allocation order.

That guy is blowing hot air out of his ass (or at least not making clear that he’s comparing apples-to-oranges).

There are going to be cases where one will have to try to devise a deterministic allocation scheme in order to get the performance they want. When deterministic is feasible, then yes higher performance than a non-deterministic algorithm is possible. But compare apples-to-apples. Don’t compare deterministic to non-deterministic allocation, because they’re not the same code base, algorithm, and problem set.

The reason I propose to not provide facilities in the compiler for static linear-typing and region analysis (a la Rust), is because (so far) I think the use cases are exceptional and in that case the programmer should devise some very specific/customized deterministic allocation solution for his needs. And because for a programming language design we have to stay within a complexity budget. However, I am open to changing to my mind and see the posts that follow this one.

@NodixBlockchain wrote:

https://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector

How many times am I going to have to cite that the BG-RC of the 2003 research paper eliminates the long collection pauses because it doesn’t have to mark-sweep (MS) the entire old generation. You’re citing a BG-MS type collector which is inferior. (That you do not even realize it, is yet another in an endless series of examples of you not paying attention and being clueless about the topics of discussion).

I asked you politely numerous times to please STFU with your useless noise IN MY THREAD. You’re being disrespectful to my thread. I asked you to wait at least 6 long posts by others between your posts. And 10 if any of the others are short.

If the GC can't deal with all the cases properly then it's broken period.

Nothing, not even manual mgmt with finite programmer man-hour (and hairball rigor mortis maintenance limits) resources, can deal with all possible cases. You’re talking hair-brained BS as usual. I will not be respectful to you, when you are writing nonsense and also disrespecting my reasonable request.

The entire point is the GC can do about as well (on average) as manual memory mgmt can do with the same (or less) programmer and computer resources.

Even “manual memory mgmt” will have corner cases and break. No one has infinite programmer resources. Broken means sometimes remains broken because there are not enough resources and/or the maintenance hairball has been too unwieldy.

If you want to expend all that extra effort on “manual memory mgmt” for an abstruse case, then go ahead. No one is stopping you. In any case, please stop shitting on my thread. I hope this is the last time I have to ask you.

It's not like you can go ball out allocating millions of object with reference without a care in the world about how to optimize memory in sort that objects that are short lived can easily be detected as such, and that GC will always function in optimal manner without some form of manual optimization.

Whether you use a GC or “manual memory mgmt”, you still need to think about tuning your code.

But the point is there is no such thing as “manual memory mgmt”. You end up writing a customized automated collector, because manual memory management can’t exist given non-determinism of memory allocation, because the programmer can’t actually click a button on the computer every time a program is running when memory needs to be deallocated.

What you mean by “manual memory mgmt” is something like reference counting and then attempting to avoid cyclical references by carefully designing your program to use weak pointers. Yet that is an automatic GC scheme (albeit a very inefficient one that can’t collect cyclical references) and you rely on human-error to litter the code with memory leaks via unavoidable instances of cyclical references (in the general case). If you prefer that mgmt paradigm, then feel free to use it. But I do not.

Yet I am repeating what I had already written in my prior posts in this thread, yet you do not assimilate the information, thus I am forced to repeat myself over and over again.

PLEASE LEAVE MY THREAD. This is the 3rd time I’ve asked you. I thought you were going to respect my request, but you just can’t contain your exuberance. And I do not care what your opinion is of me, so do not waste your time telling me. JUST LEAVE PLEASE.

You (probably unintentionally) wasted an enormous amount of my time on BCT ^{_{(albeit mea culpa that I baited the trolls over there and you weren’t intentionally trolling me there nor here)}} and continuing to do so here. There may be nuggets of interaction with you which have been fruitful, but the signal-to-noise ratio is very high. I told you that I’m willing to endure that high S/N ratio, but not in this thread— which I intend to be concise.

Answer 39 · 2020-06-26T20:40:22.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346916421
Original Date: Nov 24, 2017, 9:40 PM CST

I wrote:

I don’t understand how a generational garbage collector is going to have a problem with the volume of data allocations in young generations, unless it is exceeds what any form of memory mgmt could handle with non-deterministic allocation order.

[…]

There are going to be cases where one will have to try to devise a deterministic allocation scheme in order to get the performance they want. When deterministic is feasible, then yes higher performance than a non-deterministic algorithm is possible. But compare apples-to-apples. Don’t compare deterministic to non-deterministic allocation, because they’re not the same code base, algorithm, and problem set.

The reason I propose to not provide facilities in the compiler for static linear-typing and region analysis (a la Rust), is because I think the use cases are exceptional and in that case the programmer should devise some very specific/customized deterministic allocation solution for his needs. And because for a programming language design we have to stay within a complexity budget.

Examples of deterministic allocation that would unnecessarily tax the automated non-deterministic GC, are assignment of a new instance either exclusively to a block scoped pointer (aka pointer stored locally on stack) in a loop, or as in @keean’s image buffering example allocating a new 16MB buffer each time his code needs one:

p = new Object

So in the first case, the young generation allocator is slammed with a new allocation for each iteration of the loop, and in the latter case the old generation allocator accumulates 16 MB allocations which have to be collected (and with a MS collector that is going to be very costly).

Both of those can be structured as deterministic allocation strategies and made much more efficient. In the first case, the static analysis of the compiler could potentially determine that the newly allocated object is only ever referenced (beyond a single iteration of the loop) by the one pointer it is assigned to and thus the allocated space for the object can be reused (i.e. the newly allocated object can replace the prior allocated one in the same memory location). For the latter case, assuming the 2003-invented BG-RC allocator is employed (which apparently no languages yet employ), then the programmer can reuse the same pointer always for the buffer so the compiler can check that the reference count on the prior allocated 16 MB object/buffer is going to zero when the newly allocated object/buffer will be assigned, and thus the compiler+allocator can allocate the new object/buffer replacing the object/buffer that is being freed in the same memory location. Afaik, @keean wasn’t recycling his 16 MB buffers instead of reallocating them because of some flaw in the standard APIs of the WWW. Inferior GC collectors (e.g. BG-MS) and inferior APIs on top of them, combine to contribute the malignment of automated GC, but afaics critics aren’t conceptualizing this issue properly. I keep repeating that the demarcation in improvement over automated GC, always lies in what can be deterministically deallocated.

The problem with deterministic deallocation is either it requires trusting the human to analyse correctly which are deterministic (and there can be human errors that arise) or afaik so far the state-of-the-art is the tsuris of adding the necessary annotations and compiler enforcement of certain invariants in order to achieve some provably, compiler-checked deterministic cases. And the problem is the compiler will not be able to prove all possible deterministic cases, regardless of how much complexity and tsuris we add to the programming language (a la Rust).

I would like to investigate why others have concluded that the compiler can’t do the static analysis of deterministic optimization opportunities w.r.t. to memory allocation, without assistance from the programmer. I remember @keean had mentioned to me in the past discussion in the Concurrency thread that in general for the compiler perform all optimization is a NP hard search problem and he was also concerned about the compiler optimizations changing between compiler versions introducing new bugs into code. However, we may have structure here, that reduces the search for many deterministic optimizations to a suitable computational class. Remove the tsuris entirely. I believe humans build tools to make humans more efficient, e.g. backhoes instead of spoons for digging. The automated GC algorithms are runtime tools for non-deterministic resource allocation mgmt. We also need automated algorithms for the static-time analysis. Compilers already do automated static-time analysis for many very localized optimizations such as register allocation.

Let’s actually improve the programming language state-of-the-art and do something important!

Answer 40 · 2020-06-26T20:40:24.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346948187
Original Date: Nov 25, 2017, 9:43 AM CST

So Rust has lifetimes which apply both to data races mutable memory safety (which also applies to multithreaded data races) and escape analysis optimization which prevents use-after-free while enabling deterministic resource (not just memory) deallocation (thus proving safety for stack-based memory allocation). And non-deterministic resource deallocation in Rust is handled by explicit reference counting, which is an explicit pointer type, thus complicating the type system with 3 different types of pointers (complexity that @keean and I agreed is not desirable).

All the non-deterministic younger generation deallocation is inefficiently pushed into reference counting explicitly. So an improvement might be to combine Rust’s lifetimes with that aforementioned year 2003 invented BG-RC automated garbage collector, possibly to gain the efficiency of fast generational collection of the young objects which escape deterministic lifetimes (if any significant quantity still do) and potentially eliminate the need to statically declare the type of each pointer. The compiler could I presume detect from lifetime analysis which pointers can be deterministically allocated on the stack. Pointers that escape lifetime analysis thus could be delegated to the generational GC which for older generation objects employs efficient RC + cyclical references deallocation with low pause times.

My prior post in this thread gave an example of the sort of deterministic optimization I hoped the compiler could do, but I realize such optimizations require escape analysis and the compiler would still need to require annotation of lifetime parameters because otherwise:

the computational complexity of the static analysis might explode
APIs would not be stable, c.f. also
non-monomorphized dependency injection (e.g. typeclasses) and higher-ranked callbacks couldn’t be escaped analyzed (analogous to the method overloading problem)

Tangentially, how can the Rust lifetime parameter be declared for a function input which is also an output where the input state has a different lifetime than the output state? I presume this is not currently supported by Rust. Because afaics, non-reference counted references in Rust are always immutable (only the object they point to can be mutated). So I presume Rust forces the use of reference counted pointer for mutable pointers.

_{P.S. all cited references which are only links, are archived}

Answer 41 · 2020-06-26T20:40:27.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346959635
Original Date: Nov 25, 2017, 1:04 PM CST

@keean wrote:

Ada has statically safe pointers (called access types) without all the complex lifetimes of Rust. This is the kind of model I was thinking about.

So access typed pointers are presumably deallocated when the block scope in which the type was defined is exited (because external scopes can’t access pointers of that type).

I wonder if that is less flexible and less optimized than generalized escape analysis. The contained block scopes are lumped together instead of fine grained escape analysis of tracing lifetimes input and output from the scopes. The genre of redundant allocations I alluded to may not be optimized in as many cases. Note it wouldn’t be any worse in an example where Rust’s lifetime analysis fails to be optimum.

Answer 42 · 2020-06-26T20:40:29.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-346993895
Original Date: Nov 26, 2017, 3:01 AM CST

Upthread while studying GC, I essentially independently thought of the design for BG-RC before discovering the year 2003 prior art for it. I had thought of some potential improvements to the prior art. The two ideas of mine which have survived thus far, are both pertaining to a generational collector:

The option of promoting younger generation objects to the mature space instantly when the write-bound detects their assignment to a mature object’s pointer, instead of deferring them to a young generation copy-collection phase. This I suppose eliminates the overhead for tracking which objects to promote at the copy-collection phase and I would think simplifies integration with (when promotion is to) a RC collected mature space. Perhaps the reason this isn’t done is to retain economies-of-scale and locality optimizations. EDIT: duh, as I realized downthread, this isn't possible because not all of the pointers to the same object can be updated instantly.
Generational copy-collectors have a trade-off between duration of period between promotion to mature space (which impacts how many objects leak to mature collection when they could have been collected more efficiently with a period longer than their allocation lifespan), the frequency of copy-collection of the young generation (for space and locality efficiency), and the amount of duplicate copying due to copy-collecting numerous times before promotion in order to increase said frequency.

Instead of copy-collecting the entire younger generation monolithically and tagging the object with the number of times it has been copy-collected (with promotion occurring when said count reaches a threshold), I propose to divide the young generation into separate copy-collected regions, one for each copy-collected count required for said threshold. Then perform each copy-collection only twice on each region: once for its first count and again on the last count before promotion. After each copy-collection phase promote the surviving objects in oldest region and bump the count of each region. Afaics, this eliminates most of the duplicate copying while maintaining the high frequency for the youngest objects (those that die within the lifespan period of one count).

Answer 43 · 2020-06-26T20:40:32.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-347577770
Original Date: Nov 28, 2017, 10:21 AM CST

@keean wrote:

"All objects die young" directly leads to heap fragmentation especially if you also have "everything is an object".

We can make things better by having value semantics, and storing values directly on the stack (providing they are under a certain size to prevent stack overflows), and by having object pools that re-use objects, and mutating object in-place.

As I have developed my understanding in this thread, I have come to the conclusion that the language I want needs:

Static analysis of lifetimes with lifetime parameters annotation, but without the unnecessary tsuris of Rust’s exclusivity of mutability borrowing. Lifetimes which do not escape static/compile-time analysis, can be managed deterministically, with small enough objects on the stack and larger ones on the heap. This probably eliminates the need for generational GC. Ada’s access types are much less powerful.
The lifetime analysis will aid an optimizing compiler in automatically mutating object in-place, because the compiler can detect which lifetimes don’t overlap and can reuse space. There would need to be some other manual facility for reusing object space for objects that escape compile-time lifetime analysis.

Lifetime analysis can be applied to non-memory resources also a la Rust.

Escape analysis may also be apply to my controversial idea to have the compiler automatically detect implicit concurrency dependencies so programmers doesn’t have to always explicitly structure concurrent code around Promises.

In addition to zero-cost overhead performance, these are some of the reasons I have favored the slight tsuris of lifetime annotations over the ease of a generational GC.
Reference counting (with the Bacon test deletion method of avoiding cyclical reference memory leaks) could be employed for all objects which escape static lifetime analysis.

Future work: not sure if we’d need some (generational and/or) mark-sweep heap for objects that would otherwise have their reference counts modified too frequently, and whether the runtime could automatically and dynamically manage which objects to put in the (BG-)MS heap or whether the programmer would need to manually indicate which objects need this optimization.
Linked-lists of free space memory allocators fragment the memory space too much and exhibit poor locality for cache and memory paging faults performance. Allocating in blocks which are multiples of the virtual memory page size will enable less fragmentation because the virtual fragmentation can be mapped to contiguous physical memory, especially viable with a 64-bit address space so that virtual address space isn’t depleted by fragmentation. The optimum solution appears to be for small objects a bump-pointer style monotonic allocation within multiple of page size blocks wherein none of the freed space in the block is reused and the entire block is not freed until every object in the block is freed (c.f. “§2.1.3.5 Mark Region”). That enables fast allocation by bumping a pointer and fast deallocation by decrementing a counter. Fragmentation induced by delayed freeing of blocks is in virtual memory space thus mostly irrelevant given a 64-bit address space. The MMTk segregated free space allocator seems to employ some variant of this strategy.

I had been averse to the tsuris and noise of Rust’s lifetime parameter annotations. But perhaps this can be mitigated significantly not by having a different focus for rules on eliding the annotations, and only a less obtrusive syntax for the annotation.

Specifically I’m contemplating Unicode superscripts for annotations that apply to the lifetime of the type and subscripts for annotations that apply to the fields of the type (if any) with ₋ indicating to skip/elide annotation of a field. I’m intending to fully embrace Unicode characters in the syntax of Lucid. It’s time to move forward and stop making excuses about keyboards. Any decent programmer should have $100 keyboard which is programmable with custom keys and keystrokes. And IDE’s should provide such facilities to easily type the necessary Unicode characters.

So compared to following Rust example:

fn f<'a, 'b, 'c>(x: &'a Foo<'c>, y: &'b Foo<'c>) -> &'a Foo<'c>

Lucid could have something more elegant such as:

f(x, y)                                 : (Foo¹₃, Foo²₃) → Foo¹₃

In the above Lucid example, I elided the &. Rust requires the programmer explicitly declare which references are (passed as) pointers and which are (passed by) value. Ditto in data structures explicitly declare whether the name represents a pointer to the type or the value of the type in-place (not quite exactly the same concept as unboxed).

I contemplate that I prefer that the default is passed-by-reference for function arguments and in-place for fields of data structures, except for primitive types would always be passed-by-value. Thus function calls would not be littered with superfluous & annotations on arguments as is the case for Rust. For declaring pass-by-value for functions the annotation would be * and the annotation for pointer in data structures would be &.

Everything about the above design could be transpiled to native JavaScript and its GC (albeit run slower than optimum). I think even pointers into data structures and arrays can be supported, so that a C-like syntax can be transpiled. I’m not even getting into typeclasses above as that is not my highest near-term priority. Can be added later.

Thus the lifetime annotations and analysis features could be delayed if the goal is to get rolling asap with a syntax and JavaScript transpiler for Lucid, so that code which would otherwise be written in for example TypeScript and C/Emscripten would not have to be rewritten later in Lucid. More optimum compiled targets could be developed such as WebAssembly which leverage the full power of the design laid out above, including more optimum compilation of the C-like code. One of my test use cases would be to translate my Blake C code to Lucid.

Answer 44 · 2020-06-26T20:40:35.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-349153820
Original Date: Dec 4, 2017, 6:27 PM CST

Today I had added an edit to my significant motivations for creating a new programming language (Lucid) to reflect my direction in this thread. Then someone reminded me to go read ESR’s recent blogs. That recent edit at the aforementioned linked comment corresponds nearly precisely with some recent blogs by Eric S. Raymond:

ESR wrote:

My last post (The long goodbye to C) elicited a comment from a C++ expert I was friends with long ago, recommending C++ as the language to replace C. Which ain’t gonna happen; if that were a viable future, Go and Rust would never have been conceived.

[…]

My problem with the [C++] language, starkly revealed by that adventure, is that it piles complexity on complexity upon chrome upon gingerbread in an attempt to address problems that cannot actually be solved because the foundational abstractions are leaky. It’s all very well to say “well, don’t do that” about things like bare pointers, and for small-scale single-developer projects (like my eqn upgrade) it is realistic to expect the discipline can be enforced.

Not so on projects with larger scale or multiple devs at varying skill levels (the case I normally deal with). With probability asymptotically approaching one over time and increasing LOC, someone is inadvertently going to poke through one of the leaks. At which point you have a bug which, because of over-layers of gnarly complexity such as STL, is much more difficult to characterize and fix than the equivalent defect in C.

[…]

There were two directions a successor language to C might have gone. One was to do what C++ did – accept C’s leaky abstractions, bare pointers and all, for backward compatibility, than try to build a state-of-the-art language on top of them. The other would have been to attack C’s problems at their root – fix the leaky abstractions. That would break backward compatibility, but it would foreclose the class of problems that dominate C/C++ defects.

The first serious attempt at the second path was Java in 1995. It wasn’t a bad try, but the choice to build it over a j-code interpreter mode it unsuitable for systems programming. That left a huge hole in the options for systems programming that wouldn’t be properly addressed for another 15 years, until Rust and Go.

[…]

This is in many ways a bad situation. It was hard to really see this because of the lack of viable alternatives, but C/C++ has not scaled well. Most of us take for granted the escalating rate of defects and security compromises in infrastructure software without really thinking about how much of that is due to really fundamental language problems like buffer-overrun vulnerabilities.

So, why did it take so long to address that? It was 37 years from C (1972) to Go (2009)

[…]

Over time, there’s been a gradual shift from languages that require manual memory management to languages with automatic memory management and garbage collection (GC). This shift corresponds to the Moore’s Law effect of decreasing hardware costs making programmer time relatively more expensive. But there are at least two other relevant dimensions.

One is distance from the bare metal. Inefficiency low in the software stack (kernels and service code) ripples multiplicatively up the stack. This, we see machine-centric languages down low and programmer-centric languages higher up, most often in user-facing software that only has to respond at human speed (time scale 0.1 sec).

Another is project scale. Every language also has an expected rate of induced defects per thousand lines of code due to programmers tripping over leaks and flaws in its abstractions. This rate runs higher in machine-centric languages, much lower in programmer-centric ones with GC. As project scale goes up, therefore, languages with GC become more and more important as a strategy against unacceptable defect rates.

When we view language deployments along these three dimensions, the observed pattern today – C down below, an increasing gallimaufry of languages with GC above – almost makes sense. Almost. But there is something else going on. C is stickier than it ought to be, and used way further up the stack than actually makes sense.

[…]

Not so the new wave of contending systems-programming languages, though. Rust and Go are both explicitly responses to increasing project scale. Where scripting languages got started as an effective way to write small programs and gradually scaled up, Rust and Go were positioned from the start as ways to reduce defect rates in really large projects. Like, Google’s search service and Facebook’s real-time-chat multiplexer.

I think this is the answer to the “why not sooner” question. Rust and Go aren’t actually late at all, they’re relatively prompt responses to a cost driver that was underweighted until recently.

OK, so much for theory. What predictions does this one generate? What does it tell us about what comes after C?

Here’s the big one. The largest trend driving development towards GC languages haven’t reversed, and there’s no reason to expect it will. Therefore: eventually we will have GC techniques with low enough latency overhead to be usable in kernels and low-level firmware, and those will ship in language implementations. Those are the languages that will truly end C’s long reign.

There are broad hints in the working papers from the Go development group that they’re headed in this direction – references to academic work on concurrent garbage collectors that never have stop-the-world pauses. If Go itself doesn’t pick up this option, other language designers will. But I think they will – the business case for Google to push them there is obvious (can you say “Android development”?).

Well before we get to GC that good, I’m putting my bet on Go to replace C anywhere that the GC it has now is affordable – which means not just applications but most systems work outside of kernels and embedded. The reason is simple: there is no path out of C’s defect rates with lower transition costs.

ESR is conflating and/or oversimplifying a bit here though. Buffer overruns and some other forms of programmer errors aren’t related to GC (aka memory allocation mgmt), but rather to static or runtime checking.

He’s really intending to drive at the general concept of the sweet spot in balancing programmer productivity, lower defect rates, easy adoption, and adequate performance for the use case (including large projects and some systems programming), as he explained further in his follow-up blog:

ESR wrote:

An increase in praxeological understanding of technology can reframe it, leading to tremendous increases in human productivity and satisfaction, not so much because of changes in our tools but because of changes in the way we grasp them.

In this, the third of my unplanned series of posts about the twilight of C and the huge changes coming as we actually begin to see forward into a new era of systems programming, I’m going to try to cash that general insight out into some more specific and generative ideas about the design of computer languages, why they succeed, and why they fail.

In my last post I noted that every computer language is an embodiment of a relative-value claim, an assertion about the optimal tradeoff between spending machine resources and spending programmer time, all of this in a context where the cost of computing power steadily falls over time while programmer-time costs remain relatively stable or may even rise. I also highlighted the additional role of transition costs in pinning old tradeoff assertions into place. I described what language designers do as seeking a new optimum for present and near-future conditions.

Now I’m going to focus on that last concept. A language designer has lots of possible moves in language-design space from where the state of the art is now. What kind of type system? GC or manual allocation? What mix of imperative, functional, or OO approaches? But in praxeological terms his choice is, I think, usually much simpler: attack a near problem or a far problem?

“Near” and “far” are measured along the curves of falling hardware costs, rising software complexity, and increasing transition costs from existing languages. A near problem is one the designer can see right in front of him; a far problem is a set of conditions that can be seen coming but won’t necessarily arrive for some time. A near solution can be deployed immediately, to great practical effect, but may age badly as conditions change. A far solution is a bold bet that may smother under the weight of its own overhead before its future arrives, or never be adopted at all because moving to it is too expensive.

[…]

The failure modes of near designs are uglier. The best outcome to hope for is a graceful death and transition to a newer design. If they hang on (most likely to happen when transition costs out are high) features often get piled on them to keep them relevant, increasing complexity until they become teetering piles of cruft. Yes, C++, I’m looking at you. You too, Javascript.

[…]

That said, there is a grand strategy expressed in the Go design that I think is right. To understand it, we need to review what the near problem for a C replacement is. As I noted in the prequels, it is rising defect rates as systems projects scale up – and specifically memory-management bugs because that category so dominates crash bugs and security exploits.

[…]

Of course, the proboscid in the room when I say that is Rust. Which is, in fact, positioning itself as the better far bet. I’ve explained in previous installments why I don’t think it’s really ready to compete yet.

[…]

Where Rust will be in five years is a different question, of course. My advice to the Rust community, if they care, is to pay some serious attention to the transition-cost problem. My personal experience says the C to Rust energy barrier is nasty. Code-lifting tools like Corrode won’t solve it if all they do is map C to unsafe Rust, and if there were an easy way to automate ownership/lifetime annotations they wouldn’t be needed at all – the compiler would just do that for you. I don’t know what a solution would look like, here, but I think they better find one.

Well in this thread we discovered that Go’s design strategy may be sub-optimal, and perhaps some static checking similar to Rust’s lifetimes could be unbundled — from 99% of the tsuris which is Rust’s total order on mutable borrowing — providing more optimal static memory allocation at a very low tsuris, enabling automatic reference counting (RC) with automatic circular reference probing to provide a more performant result with less tsuris. The mark-sweep (concurrent or otherwise) GC suffers from requiring a tradeoff between low pause times, performance, asymptotic degradation as the total memory allocated increases, and accelerated degradation as the amount of free physical memory is reduced. Runtime GC is never as performant as static GC.

One key insight of mine (and others) is that the total order aspect of exclusive mutable borrowing is probably an Achilles heel of Rust. And the requirement for exclusive mutability throughout all the code (or otherwise also losing lifetime tracking by littering the code with unsafe annotations) is not the only paradigm for preventing logic errors due to shared mutability especially in synchronous single-threaded code (i.e. separated threads/channels; which btw ESR pointed out is Go’s strategy). Although asynchronous callbacks (Promise) in single-threaded code creates effectively out-of-ordered access, they’re not preemptive race conditions requiring mutexes and at least these are demarcated (because they’re not preemptive). Whereas, Rust’s total order on mutability presumes (even preemptive multitasking via) multi-threading races everywhere, anytime. Exclusive mutability isn’t even always the desired or even plausible algorithm for multithreading (nor out-of-order access) such as when mutating a shared resource; although it does provable avoid runtime dead and live locks (although as @keean and I had discussed somewhere: locking errors due to program interaction with unbounded instances of external locking can’t be statically proven to not exist!). Instead minimizing the shared mutables between threads (or between out-of-ordered access) is typically the desired design strategy.¹

Go also lacks typeclasses, afaik can’t even guard against null pointer usage, and other significant faults.

¹ _{Note out-of-ordered access (not necessarily multithreading data races) may employ logic, data structures, and algorithms which are sound with any order of access (which in addition to others such as queues, may include immutable data structures or exclusive mutable borrowing). Otherwise, the problem though is that any code which waits after calling an asynchronous function (e.g. that returns a Promise) is out-of-order after that wait w.r.t. to other code which preempted itself by waiting for callback, i.e. the out-of-ordering of callbacks “infects” all parents in the function call hierarchy. Thus, it might in some cases be better to have synchronous single-threaded zones which prevent out-of-order access because they’re never waiting on more than one callback in the zone (instead a different thread zone is running while waiting on a callback); and said zones mustn’t share (with each other) references to mutables. Such synchronous single-threaded zones for shared mutable access are probably most applicable to replicated tasks that run round-robin in parallel on a thread pool (perhaps with some limited asynchronous capability to launch multiple parallel, isolated sub-tasks that don’t share mutables while stalling the synchronous parent zone entirely until all the asynchronous are completed). Obviously UI code couldn’t be placed in such a synchronous zone and would instead need to employ the aforementioned logic, data structures, and algorithms which are sound with any order of access.}

Answer 45 · 2020-06-26T20:40:39.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-349427473
Original Date: Dec 5, 2017, 2:16 PM CST

@shelby3 wrote:

And the requirement for exclusive mutability throughout all the code (or otherwise also losing lifetime tracking by littering the code with unsafe annotations) is not the only paradigm for preventing logic errors due to shared mutability especially in synchronous single-threaded code […] Exclusive mutability isn’t even always the desired or even plausible algorithm for multithreading (nor out-of-order access) such as when mutating a shared resource, although […] Instead minimizing the shared mutables between threads (or between out-of-ordered access) is typically the desired design strategy.¹

Rust’s total order exclusive mutability borrowing restriction (although enabling pointer aliasing optimizations^†) can unnecessarily prevent localized reasoning and thus afaics is not the one-size-fits-all solution that perhaps some naive Rust proponents may think it is. The cost of the pointer aliasing optimization can be quite onerous. The new lifetime checker in C++ also admits such “Known Issues”.

For example, in the single-threaded case of shared writable access, even in the asynchronous case where code allows order-of-order preemption by waiting on more than one callback, there are compile-time deterministic cases where there isn’t a critical section as would be the case for multithreading (multithreading preempts at any non-deterministic instant). An example of such a case is reading the value, computing a new value, and writing the result back before providing an opportunity for another callback to run in the same thread. If for example single-threaded code has a critical section on an externally shared mutable, it could opt to insure it doesn’t provide an opportunity for another callback to run within that critical section. Thus the soundness can be in some cases subject only to local reasoning and doesn’t always require the tsuris+inflexibility of a total (aka potentially the entire program or unbounded universe) order on mutable borrowing.

Rust naive proponents may imply that exclusive mutable borrowing is even necessary in single-threaded code to prevent data races such as use-after-free, which occur when the code obfuscates that it erroneously modifies some invariant which the programmer or static lifetime checker assumed, such as iterator invalidation or invalidation of the presumed lifetime.

In the latter linked example, Rust’s lifetime analysis presumes the exclusivity of mutable borrows (i.e. the soundness of either requires the other being enforced), yet instead a compiler could disallow returning a lifetime tracked reference that is itself (i.e. not what it references) mutable^‡, so the programmer would need to either make said reference immutable, or have fn get_text return an immutable doubly-indirected pointer (i.e. an immutable reference to the mutable reference)^††, or replace fn get_text with a function that does some operation on the referenced text. Alternatively to allow the said reference to be mutable yet still avert a total order on exclusive mutable borrowing, for the synchronous single-threaded case that protects the critical section (i.e. no out-of-order callbacks can run within it and no shared multithreaded access), the exclusive mutability borrowing could be compiler enforced but would only be necessary to be enforced within the said critical section which in that example is the lifetime of the my_text reference (and not the entire lifetime of the editor).

^† _{Which are thus provably sound because the invariants are enforced by the compiler, unlike C’s restrict annotation which is reliant on the programmer’s judgement.}

^‡ _{Specifically which is mutable during the lifetime (i.e. scope) of the instance of the object (e.g. editor in this case) with which it shares its lifetime.}

^†† _{The issue of whether a null pointer could be assigned to the said mutable reference is an orthogonal issue. Presumably the language has an Option or Nullable type, which guards against attempting to read or write from the null memory location.}

For the former linked example, iterator invalidation is sometimes not a problem. For example, most iterators need to check bounds on each previous or next action, so even removing items from the collection being iterated wouldn’t cause an out-of-bounds access, except in the case that the iterator accessor is invalidated analogously to the aforementioned fn get_text example for a lifetime analyser (i.e. elements aren’t RC objects) or because the reference to an element stored-in-place was returned by the said accessor. So for the aforementioned problematic cases, the solutions (which avoid Rust’s total ordering on exclusive mutable borrowing) would be the same as in the aforementioned paradigm. Note iterator invalidation can also be a semantic error w.r.t. some algorithmic assumptions of invariance, yet that’s sort of irrelevant because in general, lifetime and borrow checking will not prevent all classes of semantic errors.

Another example where Rust prevents localized reasoning is where the lifetime analysis is unable to infer invariants that are obvious to the programmer. Note even if Rust were to deduce such invariants, they would still need to be declared as annotations; otherwise, changing code would require recompiling the universe of all the callers of the method buy_car because the semantics of the lifetime analysis could have changed (whereas annotating forces the programmer to change the annotation if he really wants that invariant to change, which btw is the main reason lifetime parameter annotations must normally be explicit):

@shelby3 wrote:

APIs would not be stable, c.f. also

@shelby3 wrote:

If you are coming from languages like Java or Objective-C you might be wondering why do you have to deal with lifetime in Rust. There are very good reasons.

Readers should note that “use after free" errors are not the only reason lifetimes are required. They’re also required so that the borrow checker can assure exclusive writable access so that multithreaded preemption of critical sections doesn’t have to be guarded with locks. Locks create dead and live lock errors at runtime. And contention around locking reduces performance. See Sean Parent’s Concurrency video on Youtube about the cost of locks and that they don’t scale.

However, I think the total order on borrowing for Rust was a design mistake. I’m working on an alternative (Lucid) which I think is more flexible and less tsuris.

Answer 46 · 2020-06-26T20:40:41.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-350958453
Original Date: Dec 12, 2017, 12:33 AM CST

@shelby3 wrote:

[…] such as iterator invalidation or invalidation of the presumed lifetime.

Another example where Rust prevents localized reasoning is where the lifetime analysis is unable to infer invariants that are obvious to the programmer.

The above cases in addition to demonstrating that Rust can’t (and not elegantly) model all possible optimized algorithms statically (no static typing ever can!), also exhibit that Rust's lifetimes and exclusive mutable references are interdependent, because Rust has to insure that the all the heap allocations (e.g. String) are statically tracked and deallocated at the correct stack frame expiration. References to stack allocated objects don’t need to remember to deallocate the objects assigned to them (because stack allocated objects are deallocated in bulk by the stack frame expiration) and thus only lifetimes need to be enforced. Whereas, where assigning a different heap allocated object to a reference that already references a heap allocated object, the former object must be deallocated at that instant, because the eventual stack frame destructor of the said reference can only remember to deallocate the last object it’s referencing:

_{If you are coming from languages like Java or Objective-C you might be wondering why do you have to deal with lifetime in Rust. There are very good reasons. In Java the use after free problem is solved through the use of Garbage Collection. GC will not destroy the String as long as any variable points to it. Objective-C will employ reference counting to solve this problem. In our example there will be two references – one by the text member variable and the other by my_txt. The String will be destroyed only when the reference count goes to 0. In other words when both text and my_txt stops existing.}

_{GC requires a heavy runtime and may incur awkward pauses. Reference counting has extra overhead and simply not reliable in all cases. Rust’s lifetime provides an elegant solution with zero runtime overhead.}

Thus exclusive mutable references are necessary (combined with lifetimes) for static tracking of the allocation of heap allocated objects. Yet for the static tracking of stack allocated objects, then only lifetimes are required.

Rust presumably considers the extra burden/inflexibility/tsuris/non-optimality of maintaining a total order on exclusive mutable references to be a freebie because Rust presumes that constraint is required any way because instantaneous preemption is assumed everywhere (i.e. multithreading everywhere). However, I’ve proposed we consider single-threaded models and thread pools.

It occurs to me that the generational portion of an automated GC is essentially as runtime performant (or perhaps even more so due to bump pointer allocation of the generational allocator) as Rust’s static heap allocation lifetimes, excepting the need for a BG-MS or BG-RC garbage collector to have a write-barrier to detect when assigning to a older generation (for MS) or RC (reference counted) object. Rust’s statically deterministic deallocation frees the space of the objects asap (i.e. sooner than the periodic generational collector) but not necessarily sooner to the usable memory pool due to fragmentation issues; and if there’s no destructors on those objects then the additional immediacy is irrelevant. Remember most of the generational objects die before the collection cycle so they have no overhead when freed, less fragmentation, and lower overhead to allocate.

So it occurs to me that with a single-threaded and thread pool model wherein we don’t always need exclusive mutability everywhere, the model that may be both more runtime performant and more efficient, elegant, less hassle, and give the programmer more control, would be to have the programmer statically declare which references will live a long time and these will be statically known to be RC. Thus, the performance reducing runtime write-barrier would no longer be required. Other heap allocated objects stay in the generational GC until all non-RC references to each of them disappear or until the object is assigned to an RC reference (which promotes the object to the RC collector). I think programmers know quite well which objects will live a long time or not, even if they don’t know exactly how long. Also RC references could be employed for any heap allocated objects that have a destructor, e.g. file handle objects that have to close the file. Remember there are algorithms that exist now for solving the circular referencing leaks that used to plague RC. Also the BG-RC collector showed that very low pause times were possible and advantages compared to BG-MS (and even concurrent collectors). Essentially I’m proposing BG-RC but with a static (instead of runtime) promotion from younger generational to the older RC generation, i.e. I propose to raise the performance to compete with Rust while discarding the tsuris and algorithmic inflexibility of Rust.

So this means that non-RC references might be pointing to an object which has been promoted to RC. Such objects can’t actually be deleted by RC decrements until all non-RC references to the object are no longer reachable, i.e. promoted objects have to be flagged, and the flag not remove until the tracing of the generational collector no longer has any reference to the object.

I will want to write some more thoughts I have on this such as other optimizations for low-level code in conjunction with this design, optimizing functional programming/closures, and more thoughts on how to deal with shared mutability and asynchronicity. Perhaps this combination of design choices I have proposed, may be superior to both Go and Rust for a mainstream programming language.

Answer 47 · 2020-06-26T20:40:45.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-350962848
Original Date: Dec 12, 2017, 1:01 AM CST

I think it's interesting, but I feel you have underestimated the importance of memory space (rather than time). With manual memory management most time is spent on reliability and performance, with GC I spend as much time on reducing memory usage and performance. Most GC does not seem to reclaim on failed allocation as far as I can tell. Software should be robust, and to me there is only one really robust way to do things - make sure you have the memory available before you start an operation, and reserve the space necessary to suspend the computation before starting. A reasonable compromise might be to do a GC on failed allocation, and a de-fragment on double fail, and suspend the thread on triple fail. Reference counting on the stack might be worth it for robustness (maybe that's why Apple likes it).

Answer 48 · 2020-06-26T20:40:47.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-350967494
Original Date: Dec 12, 2017, 1:27 AM CST

@keean wrote:

but I feel you have underestimated the importance of memory space (rather than time)

Virtual memory is practically cost-free on a 64-bit address space (but intra-page fragmentation and even harddisk space are not). I mentioned upthread that I’m not prioritizing for 32-bit address space. Some trade-offs have to be made in design and I think that’s an acceptable one, given there will still be for example Rust for specialty and archaic use cases.

Additionally, I’m contemplating that it's not entirely meaningful for us to compare experience with JavaScript’s concurrent BG-MS collector to my proposal for programmer dictated demarcation between a RC collector and an automated BG generational collector that doesn't promote to RC based on age. You haven’t even tried an integrated BG-RC collector (which mine proposes to improve on) because apparently even BG-RC has not yet been adopted by any language.

Most GC does not seem to reclaim on failed allocation as far as I can tell.

Please share an example? I would like to analyse this.

Software should be robust, and to me there is only one really robust way to do things - make sure you have the memory available before you start an operation

In the general case, this is an impossible requirement. You’re expecting the universe to be bounded. We can design some API to return an error if the memory is not available, but then the caller may fail. If the caller (which may be an external service not knowable at compile-time) has to poll everything it needs downstream before it starts, then it needs a crystal ball to know the future of all its future input. Also it presumes the world can be paused in portions, so that computations of available memory for a future task aren't invalidated before the said task completes.

You’re wanting to enumerate the universe (all of the state machine states) of possible inputs, i.e. you must be designing for Six Sigma quality for a use case analogous to the Space Shuttle. Most programming can’t be done to that level of precision realistically nor is it even desirable in most programming to waste such effort on it. Programs typically having acceptable reliability and performance by having sufficient extra memory to deal with unexpected spikes, preferably physical memory so that virtual memory page swapping to disk doesn’t cause errors due to unexpected timeouts.

Perhaps you need to explain your perspective more to me, perhaps I’m not grokking completely your concern?

A reasonable compromise might be to do a GC on failed allocation, and a de-fragment on double fail, and suspend the thread on triple fail.

There will be no failed allocations in 64-bit virtual memory space except in pathological cases.¹ Just a slow down as physical memory is filled up, as it should be.

Also I have already mentioned in this thread ways to significantly improve upon the fragmentation issue. I think perhaps you are pushing the not-quite-state-of-the-art JavaScript GC beyond its level of competency? And in a 32-bit virtual address space?

Let’s analyse a specific example of yours so we can determine what is going on.

Reference counting on the stack might be worth it for robustness (maybe that's why Apple likes it).

Could you unpack for me what you’re thinking here? Why you mention the stack?

¹ _{Pathological cases probably require some way to opt-out of the automated GC and do some manual memory management and/or dictate the use of RC. We should analyse some pathological examples to gain more detailed understanding of the issues.}

Answer 49 · 2020-06-26T20:40:50.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352243394
Original Date: Dec 17, 2017, 3:41 AM CST

I was contemplating whether there's any justifiable need for supporting stack allocation and lifetimes in addition to the proposal I made for a statically demarcated variant of BG-RC.

W.r.t. to purely stack allocated (i.e. no heap allocation), lifetime checking is only required when references (aka pointers) to stack allocated would be allowed. Note that then would require another distinct static type for references to objects allocated on the stack versus those that will be deallocated by the runtime GC (of which those have two types to demarcate those which employ RC or not). So I'd much prefer we didn't need static allocation on the stack. K.I.S.S. coherent is preferred design. Good design should maximize the benefits with the least number of concepts/complexity.

Afaics, the main performance boost gained from static stack allocation as compared to copy compacting, bump pointer allocation GC ^{_{(for short-lived objects, with in my proposal the long-lived objects statically typed as RC to avoid the write-barrier performance cost)}}, is that the objects on the stack can be accessed via offsets on the single stack pointer register (e.g. SP on Intel) and they are typically clustered in memory address space to aid in cache coherency performance. The deallocation of objects on the stack by changing the value of the SP isn't a significant performance boost compared to not copy compacting untraced objects in the runtime GC. Runtime GC tracing is amortized over many function/procedure calls of which many/most of them typically leave nothing to trace nor copy compact— i.e. the tracing, copy collection is avoided and thus no significant performance cost.¹

But that performance benefit for stack allocation can be mostly eliminated, because for those pass-by-value arguments of a function/procedure which would have been passed on the stack, they could be passed on contiguous region of bump pointer allocated heap of the GC, and then function can be provided a pointer to said region, so that all said objects can be accessed through said single pointer with offsets. For pass-by-reference arguments, the reference will be stored in the said contiguous region and the object will be stored somewhere in the heap, which is again analogous (also in performance) to passing the reference on the stack for stack allocation. IOW, bump pointer allocation with tracing copy compacting deallocation is akin to amortized stack rewinding. We utilize one additional register of the CPU since we still need to keep the SP separately pointing to the call stack. CPU registers aren't as scarce of a resource these days.

I'm failing to find any significant justification for supporting the tsuris of static (i.e. compile-time) stack allocation and lifetimes. I think this proposal of mine will defeat both Rust and Go as the next mainstream, general purpose programming language.

¹ _{The performance cost for mark-sweep GC tracing is on the long-lived objects that have to be traced over and over again, which make asymptotic performance horrible (but my proposal entirely avoids this by proposing that long-lived objects must be statically typed as such by the programmer).}

I wrote a highly relevant comment in @NodixBlockchain's thread.

Answer 50 · 2020-06-26T20:40:53.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352245289
Original Date: Dec 17, 2017, 4:19 AM CST

The main advantages of stack allocation are, fast allocation (just decrement a pointer), fast deallocation (just increment the pointer), no indirection (we can hold pointers directly into the stack), no pausing (no mark/sweep phase), and zero fragmentation.

Answer 51 · 2020-06-26T20:40:55.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352257134
Original Date: Dec 17, 2017, 7:50 AM CST

fast allocation (just decrement a pointer)

I explained in my prior post that my proposal is the same. My reading is that is what "bump pointer" allocation means in the literature.

fast deallocation (just increment the pointer)

I explained in my prior post that essentially the same performance in the case that the objects die young, because they rarely traced (amortized) nor copy compacted. The key difference is that with stack deallocation, the timing of the deallocation is more immediate and deterministic. However, if such immediacy is desired in my proposal (e.g. if there's a destructor), then use the RC reference (aka pointer) type.

no indirection (we can hold pointers directly into the stack)

I already explained the same can be done in my proposal. But there are many cases even in stack allocation where you're going to be passing-by-reference to a function/procedure, thus you'll still have indirection on the stack allocation.

no pausing (no mark/sweep phase)

There's no pause on the tracing on the nursery because most objects die young. My proposal replaces mark-sweep (MS) for the older objects with reference counting (RC). The programmer dictates which objects live long and thus marks them with a RC type, so that no runtime write-barrier is needed on assigning references (aka pointers) because the types involved in reference assignments are known at compile-time.

and zero fragmentation

The tracing, copy compacting nursery doesn't generate any fragmentation either.

Thus I repeat:

I'm failing to find any significant justification for supporting the tsuris of static (i.e. compile-time) stack allocation and lifetimes. I think this proposal of mine will defeat both Rust and Go as the next mainstream, general purpose programming language.

Answer 52 · 2020-06-26T20:40:57.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352258187
Original Date: Dec 17, 2017, 8:08 AM CST

The tracing, copy compacting nursery doesn't generate any fragmentation either.

It generates fragmentation, then fixes it by copying. The cost is much higher. Copying is slow compared to just incrementing/decrementing a pointer. You have to double indirect for the heap because the data can move, and you have to pause when moving data between the nursery and other spaces, because you cannot be allowed to access the data mid copy.

Answer 53 · 2020-06-26T20:40:59.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352258898
Original Date: Dec 17, 2017, 8:19 AM CST

It generates fragmentation, then fixes it by copying. The cost is much higher. Copying is slow compared to just incrementing/decrementing a pointer.

Incorrect. I already stated that rarely are any of the objects still around when the tracing+copy compacting occurs. The cost is amortized away to negligible. I think this is the 3rd time I have stated this in some of the recent prior posts.

You have to double indirect for the heap because the data can move

Incorrect. My understanding from reading that book about GS, is the pointers are updated when they're traced. And I repeat, this rarely happens for objects in the nursery, so the cost is negligible.

you have to pause when moving data between the nursery and other spaces

Incorrect. Seems you're totally ignoring my proposal even though I have reexplained it more than once. In my proposal, no objects ever get moved from the nursery¹ (without the programmer indicating at compile-time for which instances he wants it to happen), and instead the programmer marks which objects are long-lived with a RC type.

because you cannot be allowed to access the data mid copy.

Any copying will induce a pause, but for that thread only because the nursery will be per thread (whereas the RC objects will have multithreaded access), but since the amount of copying in the nursery will be small, you're looking at microsecond pauses, which is irrelevant ^{_{(except maybe in some very obscure embedded programming use cases where hand-tuned memory management is required)}}. In my proposal, the programmer will be entirely in control of whether he is overloading the nursery will longer-lived objects which he should have given a RC type instead. The major pauses in MS are usually reprocessing the same long-lived objects and/or overloading the rate of generation of (too long lived or excessively large) objects which escape the nursery. Afaics, my proposal mitigates those pathological cases.

Seems you have not yet wrapped your mind around my proposal and what is different about it.

¹ _{Except when the programmer assigns a nursery object to a reference in a RC object, but the compiler can display which these are because all RC object instances are indicated at compile-time, so the programmer is in control of the cost.}

Answer 54 · 2020-06-26T20:41:02.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352262688
Original Date: Dec 17, 2017, 9:14 AM CST

Even short lived objects in the nursery will have different lifetimes, so it will fragment. Before dealing with the other points, explain why there is no fragmentation?

Answer 55 · 2020-06-26T20:41:04.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352263665
Original Date: Dec 17, 2017, 9:29 AM CST

There is one clear situation.where having no stack is better, and that is in an asynchronous language where you want a 'cactus' stack.

In these circumstances, allocating a variable frame for a function would be a heap allocation, with a pointer to the previous frame. There will be fragmentation still.

Nurseries work to prevent fragmentation because they get defragmented when copying the long lived objects out. If you mark objects as long-lived, it does not mean that you do not need to defragment the nursery.

Answer 56 · 2020-06-26T20:41:06.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352297619
Original Date: Dec 17, 2017, 6:12 PM CST

Nurseries work to prevent fragmentation because they get defragmented when copying the long lived objects out. If you mark objects as long-lived, it does not mean that you do not need to defragment the nursery.

There's negligible defragmentation of objects that die young because they rarely get traced and copy compacted as I already mentioned. And there's no long-lived objects in the nursery in my proposal.

In my proposal, the entire point is RC typed object instances never go into the nursery (and that's why no runtime write-barrier is required, which I've mentioned before). They're initially allocated on the long-lived heap, which employs the methodology I mentioned upthread to mitigate fragmentation of the long-lived heap. So the long-lived heap will have some fragmentation of the virtual address space (but not more than negligible fragmentation of physical memory if designed correctly per my upthread points), which is why I mentioned I'm prioritizing 64-bit virtual address space (but I clarified this doesn't mean I'm advocating allowing memory leaks). And afaics, stack allocation would also not optimize these long-lived objects...

Comparing apples-to-apples, stack deallocation should not typically pertain to long-lived objects (at least afaics that's not the use case we're trying to optimize with stack allocation) because for example if I tried to put the entire UTXO (unspent transaction outputs) of a blockchain on the stack, the stack would overflow (as well I think the lifetime analysis would be unwieldy if not also implausible). Thus if we're correctly comparing stack allocation/deallocation to nursery allocation/deallocation, then the latter should also have no long-lived objects in my proposal (unless the programmer is derelict w.r.t. declaring RC types) and thus no fragmentation either. My proposal is not as automatic as BG-RC, because the programmer has to reason correctly about the lifespan of the instances of the objects he creates. But afaics, the performance benefits of this additional control provided for by my proposal outweigh the additional cognitive load on the programmer— which I contemplate is afaics significantly less tsuris/annotations/complexity than Rust's lifetime+exclusive mutability and the programmer is as you pointed out otherwise fighting with a fully automated BG-MS in pathological cases which he otherwise can't have the GC exclude from the workload of the MS collector.

There is one clear situation.where having no stack is better, and that is in an asynchronous language where you want a 'cactus' stack.

In these circumstances, allocating a variable frame for a function would be a heap allocation, with a pointer to the previous frame. There will be fragmentation still.

You're referring to for example how JavaScript Promise unwind the stack back to the event loop and the state of the closure for the callback of the .then() is stored on the heap for execution when the event loop triggers it. So having the stack frame on the heap by default is more efficient, than copying from the stack to the heap for said closures. I guess 'cactus' stack means the call stack is stored separately from the stack frames with the latter on the heap.

But note I had proposed that for some types of non-UI code, multiple instances of the code could be running each on a different thread (from a thread pool) and their stack gets entirely stalled (a green thread concept) when the code is blocked. Thus there's no stack unwinding on blocking and this also provides exclusive mutability for all nursery objects (which are inaccessible from other threads). This proposal provides asynchronicity via running other instances of the said code while any instances are blocked. For example, this would apply to the code that processes a transaction for a blockchain wherein multiple transactions could be processed in simultaneously. Note that afaics, one of the key problems with Node.js+JavaScript (and apparently Go) is there's no way for these separate instances to access a common heap which would store the UTXO. The alternative of passing messages to a thread which manages the UTXO heap would be much slower because it would funnel what could have been parallelized access into a single-threaded queue. In my proposal, the RC instances are on a shared heap, so then immutability has to be employed (or some other means of assuring mutability safety). I contemplated that I want to make the UTXO records read-only but the tree/graph is mutable, which provides the necessary mutability safety.

Note I also contemplated that such a non-unwinding (of stack) proposal could also allow for blocking on multiple simultaneous asynchronous tasks which are spawned from the same stack frame, as long as each of those tasks has exclusive mutability access for any objects that can be mutated during the runtime of the task.

For UI code, although not required by my proposal, copying from stack to closures on the heap is a negligible cost. Also I think where plausible UI should be more optimally coded to use a stateless, no-callback design wherein asynchronous tasks (e.g. send a cryptocurrency transaction) are run in a separate thread (on a thread pool) and update the main UI thread with messaging when they're done (i.e. Go's CSP channels model). In the cases where a series of sequential, blocking tasks that depend on shared state, this could employ the aforementioned stack blocking rather than employing Promise closures on the heap and unwinding the stack. IOW, separating the concerns so that the blocking code isn't in the UI thread. This is also about structurally handling mutability safety without needing Rust's exclusive mutability borrowing tsuris. In short, I think Eric Raymond is correct that Go is closer to the next mainstream programming language than Rust is. Yet, I am proposing what I think can be better than Go. And also I know @keean is interested to get more complete HLL typing (e.g. typeclasses) which Go doesn't have.

We have the chance to make Lucid better than both Rust and Go, by studying what each of them did wrong and right. Of course, it would be most beneficial to have @keean's collaboration given his extensive experience with programming language design research and actual usage. Most specifically his expertise with typing systems, per for example his co-authoring of the OOHaskell research with the venerable Oleg.

Answer 57 · 2020-06-26T20:41:08.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352343802
Original Date: Dec 18, 2017, 1:14 AM CST

There is no fragmentation with a stack because there is a total order of object lifetimes. Any object appended to a stack has a lifetime less than or equal to the lifetime of the current top-of-stack object. This is why they don't fragment.

The same is not true of a heap, even one with a nursery. As soon as you place any object on the heap that has a shorter lifetime than the next object you will get fragmentation. Consider:

f => 
   let x = 1
   let y = 2
   let z = 3
   let w = x + y
   // GC runs now
   return w - z

As the GC is asynchronous, it can run at any time. If we consider GC running where indicated, we allocate x, y, z and w on the stack, but we have no references to x or y any more, but we need z and w, so after GC, x and y will be freed. If we imagine the heap as a simple array we have.

[...]
[x, y, z, w, ...]
[(), (), z, w, ...]

You can see the fragmentation starting as we have two empty slots separated from the 'empty space'. If we consider objects may be different sizes, this would make it worse.

Answer 58 · 2020-06-26T20:41:10.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352473855
Original Date: Dec 18, 2017, 10:12 AM CST

@keean I already explained why there will be no more than negligible performance cost of fragmentation for short-lived objects in the nursery of my proposed design (and I already explained why so just reread the prior posts), which essentially equivalent to the lack of fragmentation for stack allocation (which I stated is most applicable to short-lived objects).

Your response belies an understanding of what I have written. Obviously I know about fragmentation in the long-lived heap if it's not copy compacted (because upthread I even presented the ways to minimize such fragmentation), but that's irrelevant because we're comparing apples-to-apples meaning stack allocation for short-lived objects and the nursery for short-lived objects. Long-lived objects won't be put on the stack any way; thus you'll get fragmentation of long-lived objects even if having stack allocation for the short-lived objects.

Answer 59 · 2020-06-26T20:41:12.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-352497392
Original Date: Dec 18, 2017, 11:31 AM CST

@shelby3 I think you are misunderstanding, if your heap has zero fragmentation, then it must be a stack. A stack is a name for a heap where there is no fragmentation. There is simply no other way to manage a block of memory and have no fragmentation. Its either a stack, or it fragments. As fragmentation accumulates over-time a heap needs de-fragmentation whereas a stack does not. These are facts, and not opinion.

Edit: Generational garbage collectors can defragment the nursery when copying the long-lived objects into the next-generation region.

Some advantages of a stackless design: no arbitrary limit on stack size (no stack overflow, just out-of-memory), can run many more threads as each one does not need to reserve a stack. However not having a stack means having an assembly language runtime on every platform. It would be much easier to get portability of the languages compile target was 'C'.

Answer 60 · 2020-06-26T20:41:14.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353015132
Original Date: Dec 20, 2017, 3:44 AM CST

@keean, generational GC doesn't defragment what is no longer allocated when the tracing and copy collection runs, which will be the usual case for most that is allocated. Thus, you're incorrect in the sense that defragmentation implies a cost. Yet I'm repeating myself.

Some advantages of a stackless design: no arbitrary limit on stack size (no stack overflow, just out-of-memory), can run many more threads as each one does not need to reserve a stack. However not having a stack means having an assembly language runtime on every platform. It would be much easier to get portability of the languages compile target was 'C'.

I specifically wrote upthread that a stack would still be employed for the call stack.

Answer 61 · 2020-06-26T20:41:17.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353018068
Original Date: Dec 20, 2017, 3:55 AM CST

@shelby3 wrote:

generational GC doesn't defragment what is no longer allocated when the tracing and copy collection runs, which will be the usual case for most that is allocated. Thus, you're incorrect. Yet I'm repeating myself.

This discusses nursery fragmentation in a generational garbage collector:

http://www.mono-project.com/docs/advanced/garbage-collector/sgen/

Answer 62 · 2020-06-26T20:41:19.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353019681
Original Date: Dec 20, 2017, 4:01 AM CST

You could eliminate fragmentation in the nursery by not allowing pinned objects, and completely cleaning out the nursery on a GC run. This has the disadvantage of copying some short lived objects into the second generation that should not be there, which will increase the fragmentation of the second generation, as these objects would be discarded soon after, it also wastes time copying objects that are likely to be discarded soon.

Answer 63 · 2020-06-26T20:41:21.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353060199
Original Date: Dec 20, 2017, 7:16 AM CST

@keean if you refuse to understand my proposal, what's the point of repeating myself 5 more times. Please review my prior posts.

There's no 2nd generation in my proposal. Apparently you keep forgetting (or refusing to digest?) salient facts of my proposal that afaics I've already explained numerous times. I think we should take this discussion private into LinkedIn for a while. The discussion is getting repetitive and noisy due apparently to lack of effort applied to comprehension. And it will digress into my frustration with your lack of comprehension of that I already wrote. I think I've explained patiently enough already. You're repeating yourself numerous times and still apparently (as best I can surmise) not grokking what I had already written.

You could eliminate fragmentation in the nursery by not allowing pinned objects

How many times have I already stated that RC typed objects will not be placed in the nursery ever.

and completely cleaning out the nursery on a GC run.

How many times do I have to repeat that the nursery will be periodically copy compacted and nothing will ever be promoted out of the nursery. The negligible defragmentation cost is negligible, because the copy collector runs infrequently enough that most allocated objects are already deallocated when it the copy collector runs, thus the percentage of objects actually defragmented is negligible. IOW, as I had explained before, the cost of the running the copy collector can be conceptualized as amortized over many, many stack frames with most stack frames having come and gone during each said period. I am sure you understand what the word negligible means. I've repeated several times in my prior explanations.

Answer 64 · 2020-06-26T20:41:24.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353071030
Original Date: Dec 20, 2017, 8:03 AM CST

How many times do I have to repeat that the nursery will be periodically copy compacted and nothing will ever be promoted out of the nursery. The negligible defragmentation cost is negligible, because the copy collector runs infrequently enough that most allocated objects are already deallocated when it the copy collector runs, thus the percentage of objects actually defragmented is negligible. IOW, as I had explained before, the cost of the running the copy collector can be conceptualized as amortized over many, many stack frames with most stack frames having come and gone during each said period. I am sure you understand what the word negligible means. I've repeated several times in my prior explanations.

Which means you are agreeing with me that the nursery will suffer from fragmentation, and that you are periodically de-fragmenting it (which is what a copy compactor does). It appears we have a differing opinion about the cost of this. Its hard to be specific about the cost without implementing and benchmarking.

Answer 65 · 2020-06-26T20:41:26.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353153778
Original Date: Dec 20, 2017, 1:05 PM CST

@keean wrote:

Which means you are agreeing with me that the nursery will suffer from fragmentation, and that you are periodically de-fragmenting it (which is what a copy compactor does).

This is not a political contest. This is engineering. I already explained that you're incorrect. I will quote again (this is very frustrating):

@shelby3 wrote:

@keean, generational GC doesn't defragment what is no longer allocated when the tracing and copy collection runs, which will be the usual case for most that is allocated. Thus, you're incorrect in the sense that defragmentation implies a cost. Yet I'm repeating myself.

For a copy collector to be performing non-negligible defragmentation then there must be a non-negligible cost of copy collecting, otherwise the claim of fragmentation is vacuous:

The negligible defragmentation cost is negligible, because the copy collector runs infrequently enough that most allocated objects are already deallocated when it the copy collector runs, thus the percentage of objects actually defragmented is negligible. IOW, as I had explained before, the cost of the running the copy collector can be conceptualized as amortized over many, many stack frames with most stack frames having come and gone during each said period. I am sure you understand what the word negligible means.

@keean wrote:

It appears we have a differing opinion about the cost of this. Its hard to be specific about the cost without implementing and benchmarking.

If you wanted to challenge my claim that the cost of the copy collector is non-negligible, you could have done that very far upthread when I first mentioned it and employed the term 'amortized'. That would have relieved of us of a lot of noise such as you implying I do not know what fragmentation is (geez). The apparent fact is you weren't paying attention otherwise you could have gone straight to the point of challenging the specific claim.

It's inarguable that the copy collector's added cost will be negligible, because the longer-lived objects of the nursery will get stuck at the compacted end and won't need to be repeatedly compacted. Thus the only ones that can get defragmented are the short-lived ones, but due to the design criteria that the copy collector's period will be much longer than the average life of the short-lived objects, then the cost must be negligible. I had already stated upthread that my proposal hinges on the programmer correctly employing RC type references for longer-lived objects so as to not load the copy collector with redundant tracing.

The period of the copy collector is only limited by the amount of virtual address space (and physical memory if the programmer is overloading with long-lived objects) that can dedicated to the nursery (of which some percentage will be logically discarded but still allocated at any given moment in time thus being essentially unavailable virtual address space but this shouldn't matter in 64-bit).¹ And limited by the throughput of the copy collector which might be significantly less due to the additional repeated tracing overhead, if too many long-lived objects are allowed into the nursery by the programmer.

I think the potentially plausible retort (which I thought of since I introduced my proposal), is to argue that the programmer will not be able to correctly tag the longer-lived objects with the RC type. I have contemplated use cases wherein I think the programmer can, but that may or may not be generally the case.

¹ _{This statement depends on paging out to disk being a cost-free operation. In terms of performance, if there's DMA on the bus so that paging out to disk runs in parallel to the CPU, thus doesn't incur any performance cost. However, there's a power efficiency cost. And that presumes that logically discarded objects don't occupy the same memory page as objects which are still being accessed. So I guess it's more safe to presume physical memory as limiting factor on length of the period being traced copy compactions of the nursery.}

Answer 66 · 2020-06-26T20:41:28.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353156939
Original Date: Dec 20, 2017, 1:17 PM CST

most allocated objects are already deallocated

This seems wrong, if the rate of object allocation is the same as the rate of object deallocation there will always be plenty of objects in the nursery.

The idea that a bunch of objects get allocated then deallocated cleanly without overlapping other allocations is wrong. There will be allocations every time a function is entered, and the scope of the calling function will still be current.

Answer 67 · 2020-06-26T20:41:30.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353168180
Original Date: Dec 20, 2017, 2:03 PM CST

Its probably worth pointing out that we are discussing details, I think we are in agreement about the big-picture. I am not sure I want to have syntax for indicating long lived objects, I think implementing an efficient state of the art generational GC could be good enough.

Answer 68 · 2020-06-26T20:41:33.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353292466
Original Date: Dec 21, 2017, 2:45 AM CST

@shelby3 This is interesting, and matches my experience:

http://wiki.c2.com/?MemoryLeakUsingGarbageCollection

A couple of key quotes:

As time goes, I consider more and more the GarbageCollector as something that turns freed pointer crash into memory leaks. And yes, I'm hunting Java memory leaks more frequently than C/C++ memory leaks, but I'm hunting C/C++ crashes due to freed references much more than NullPointerExceptions in Java.

But the problem you describe certainly has the same effect, so I guess I'll have to widen my definition slightly to "failure to free memory that will no longer be referenced by the running program". It's certainly true that, in the presence of globals or long-lived locals, the explicit act of freeing memory has to be replaced by the explicit act of nulling a reference.

So is GC actually a solution? With manual memory management we have to free references to heap storage (but not all references, because of RAII). With GC we have to null references that are no longer part of the running-set of the program (but not all references, because of simple local cases). Both of these lead to bugs, its just that with GC you might get away with it because the program is too short lived to run out of memory, but the delay makes it much harder to track down and debug.

Answer 69 · 2020-06-26T20:41:36.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353293981
Original Date: Dec 21, 2017, 2:52 AM CST

Reference counting with immutable objects is an interesting point in the design space, as you cannot create circular references with immutable objects, so reference counting becomes a complete solution.

Answer 70 · 2020-06-26T20:41:38.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353297593
Original Date: Dec 21, 2017, 3:08 AM CST

The Python GC is interesting:

http://arctrix.com/nas/python/gc/

They basically sacrificed performance for portability.Of the widely available GCs JavaScript is not that bad, it is more the APIs provided by JS do not allow memory re-use, which leads to fragmentation and worse cache performance.

Answer 71 · 2020-06-26T20:41:40.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353344339
Original Date: Dec 21, 2017, 6:57 AM CST

When watching things from the angle of performance and time / bother to write programs gc is better.

When watching things from the angle of memory use, and tracking memory
bugs, RC is better.

gc will make leaks less obvious as memory use always grow. With manual
management or RC, if memory use keep increasing, its an obvious sign of a
leaks.

What make the problem even more accute is that using a gc imply the
programmer doesnt have to know anything about memory management at all, or
what the gc handle well or not, and can encourage to write programs with
sloppy memory management, without even realizing it's sloppy or trigger
bugs in the gc. And many times the programmer can't (or is not supposed to)
know how the garbage collection works.

And most of time using a stack and proper program structure doesnt cost a
lot of time for manual memory management.

Le 21 déc. 2017 10:08, "Keean Schupke" <notifications@github.com> a écrit :

…

The Python GC is interesting:

http://arctrix.com/nas/python/gc/

They basically sacrificed performance for portability. In the end of the
widely available GCs JavaScript is not that bad, its more the APIs provided
by JS do not allow memory re-use.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#35 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/Ab61I5lsZND8H3sJXxSXqInvGnWKq5eZks5tCiAQgaJpZM4OZwjF>
.

Answer 72 · 2020-06-26T20:41:42.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353346066
Original Date: Dec 21, 2017, 7:06 AM CST

In all methods we have to do something with references, as with GC we must still null references to avoid leaking memory, without GC we have to free them. RC has the same reference problem as MS, but it also has the cycle-memory leak problem.

An example may make this clearer. Imagine I have a ring-buffer, which contains references to other data. I can 'clear' the ring-buffer efficiently by setting "start = end". With manual memory management, I leak all the data referred to in the buffer because I did not free the memory pointed to. With GC (RC or MS) I leak the memory because the references still exist in the ring buffer memory which is still in scope, even though I have cleared it by setting (start = end). The reference count in the references is not decremented by moving the ring-buffer start or end pointer, and the Sweeper can still find the references in the cleared ring buffer to follow them and mark the memory as still alive.

It doesn't matter whether there is GC or not, we have to know that when implementing the 'clear' method of the ring-buffer we must deal with making safe every reference in the buffer when moving the start or end pointer. So in both cases we need to understand the underlying memory and write code to safely handle references or we will leak memory.

Answer 73 · 2020-06-26T20:41:45.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353348959
Original Date: Dec 21, 2017, 7:20 AM CST

This sort of things is the reason why i wanted to avoid all direct pointer manipulation, and avoiding holding pointer to inside a buffer or an object as a strong reference.

If references are always to the start of an object or buffer, and
offset/size are kept separately, it make reference use more obvious.

List containing references to other objects can be marked as such if the
language recognize a list primitive, and release references to objects in
the list automatically when there no reference to the list.

So only need to manage the list reference life time.

If a reference is assigned to the start of the list, it can only be a
reference to another buffer, and make the release of the reference to the
buffer ( and all references inside of it) obvious.

Le 21 déc. 2017 14:06, "Keean Schupke" <notifications@github.com> a écrit :

…

In all methods we have to do something with references, as with GC we must still null references to avoid leaking memory, without GC we have to free them. RC has the same reference problem as MS, but it also has the cycle-memory leak problem.

An example may make this clearer. Imagine I have a ring-buffer, which
contains references to other data. I can 'clear' the ring-buffer
efficiently by setting "start = end". With manual memory management, I leak
all the data referred to in the buffer because I did not free the memory
pointed to. With GC (RC or MS) I leak the memory because the references
still exist in the ring buffer memory which is still in scope, even though
I have cleared it by setting (start = end)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#35 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/Ab61I0GclpoLUzitjdTHfbZ0e1twfNhIks5tClfNgaJpZM4OZwjF>
.

Answer 74 · 2020-06-26T20:41:47.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353357587
Original Date: Dec 21, 2017, 8:00 AM CST

It still has the same problem. Consider again the example of a ring-buffer. I need a ring-buffer for the algorithm I want to implement. If the language only provides reference lists, then I base my implementation on reference lists. I have a start and end index into the list to represent the read and write pointers of the circular-buffer. I try and implement a fast-clear by setting the read-index equal to the write-index, and again I have just leaked all the objects that are referenced by the circular-buffer.

Remember container-objects themselves (like a circular-buffer) can be long-lived. For file IO, or network IO the buffers may last the lifetime of the program. When you get down to it managing objects in a collection becomes managing memory. Just because an array, list, map, or set does not obviously have any pointers or references, you can still leak memory by forgetting to delete objects from the container. In the end memory is just an array and a pointer is just an array-index.

Answer 75 · 2020-06-26T20:41:50.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353363372
Original Date: Dec 21, 2017, 8:24 AM CST

Keeping the allocated ( and freeable) reference to the list separate from structure to access it doesn't solve the leak, but it make it more obvious that the list has not been freed regardless of "access pointer" either they are index or direct pointer.

Its the one thing tricky with C or C++ is that it doesnt differentiate from
pointers and actual reference to an object or list that can ( and need to )
be freed, and pointers that are just pointing to data inside of a bigger allocated object or buffer.

Keeping the list reference separated from access mechanism makes it clear
that moving access pointer inside of it wont impact the number of
references to allocated objects.

It doesnt solve the leaks in itself, it just make it more obvious, and
remove the ambiguity between access pointer/index and reference to the list
and objects for the gc.

Le 21 déc. 2017 15:00, "Keean Schupke" <notifications@github.com> a écrit :

…

It still has the same problem. Consider again the example of a ring-buffer. I need a ring-buffer for the algorithm I want to implement. If the language only provides reference lists, then I base my implementation on reference lists. I have a start and end *index* into the list to represent the read and write pointers of the circular-buffer. I try and implement a fast-clear by setting the read-index equal to the write-index, and again I have just leaked all the objects that are referenced by the circular-buffer.

Remember container-objects themselves (like a circular-buffer) can be
long-lived. For file IO, or network IO the buffers may last the lifetime of
the program. When you get down to it managing objects in a collection
becomes managing memory. Just because an array, list, map, or set does not
obviously have any pointers or references, you can still leak memory by
forgetting to delete objects from the container. In the end memory is just
an array and a pointer is just an array-index.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#35 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/Ab61I6ZXNTSJxduZVWPNwoqgcZ89Zu7_ks5tCmRjgaJpZM4OZwjF>
.

Answer 76 · 2020-06-26T20:41:53.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353369275
Original Date: Dec 21, 2017, 8:49 AM CST

In general we can call this 'region-based-memory-management' or the Arena pattern as Rust calls it. We can have special thing called 'regions' and a region is the only thing that can store a pointer/reference. We allocate regions on the stack using RAII, so we are guaranteed that all memory associated with the region is deallocated when that stack frame returns. In this way we cannot leak any memory outside the the function declaring the region. Hovever this leads to two things for programs that have no definite lifetime (interactive programs), either the function never returns (so we leak the memory anyway), or we have to handle returning references to a region by promoting the reference to a region higher up the call-stack (which lets that reference escape, hence potentially leaking the memory).

There are three approaches to this, the Ada approach, which strictly does not let a reference to any memory leave the scope in which the memory referred to is declared. The Rust approach which attaches lifetimes to references, and extends the lifetime when returning a reference, to the outer scope. And finally the technique I like as a compromise between the two, which is more flexible than Ada, but simpler than Rust with no need for lifetimes, and that is when we return a reference we promote the memory to the outer region, and pass the reference into the function instead of returning it (this transformation can be done automatically by the compiler).

Answer 77 · 2020-06-26T20:41:56.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353370131
Original Date: Dec 21, 2017, 8:52 AM CST

I think the real solution to this sort of things boil down to have two level of reference like the algorithm of differential reference counting i posted before, with an outer counter and inner counter, or keeping track of two level of references.

The ambiguty is think come from the fact that the memory manager or compiler can't know in a obvious manner if a pointer to an element inside of the list mean that the whole list need to be kept active, or only the element pointed to by the access pointer. Or the amount of element in the list that need to be considered as active based only on a 'current pointer' access inside of the list.

With the two level of reference approach, it make it more obvious if a reference to the list or buffer itself is kept, it mean all objects in the list are kept active, and the second level allow to keep object in memory if there is a 'weaker pointer' to an element of the list. And differentiating explicitly between the pointer to the list, and pointer to elements inside of it make it more obvious when there is still active reference to an object or not.

If there is not direct reference to the list itself, when the pointer to an element of the list is incremented, all object behind it doesn't contain any active references and can be freed. And it could recognize that the list itself cannot be freed if there is an active reference to at least one element inside of it. So a reference to an element inside of a list would hide also a reference to the list itself but it would still need to allow objects inside of the list to be freed.

@

Answer 78 · 2020-06-26T20:41:59.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353373116
Original Date: Dec 21, 2017, 9:04 AM CST

I dont think thats the problem. I think the problem is when you have any collection of objects you need to manage that collection yourself. For example you have an array of employee records, you have to delete the records when you have finished with them. Because the employee records are in an array, the garbage collector does not help you manage them. The GC only knows if the whole array is referenced or not. Real applications rarely have simple flat data-structures, consider a CRM database.

So the problem comes when I have long-lived data structures, where the contents of the structure(s) have a shorter lifetime than the structure itself.

Answer 79 · 2020-06-26T20:42:01.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353378071
Original Date: Dec 21, 2017, 9:24 AM CST

I would like if there are 'context free' solution, because solutions based on scope can be tricky in mutl thread environment, as there can be different active scope in the same time, and it's not easy to track all the active scopes without a 'stop the world' solution to scan all threads.

An approach based on scope can probably solve a lot of problem though. But i would like better a solution that can make it obvious when reference are active without being solely based on scope analysis. Especially if there can be closures or such, it can probably become tricky to deal with.

But maybe the complexity it involve is too high compared to the benefit it give, and it's better to have special handling for multi thread sharing considering it as an escape. Or making it obvious when references can be shared between different thread's scope.

I guess with the simplest form of flat reccord like that, a 'pop' operation removing the item from the array could do it, but for more complex hierarchy, maybe not that much. The incovenient of pop is also if elements are only processed if they fit a certain condition, need to push them back afterward.

Maybe a solution could be to have some sort of functional style algorithm able to specify which objects in the collection are going to be used, or specify more complex pattern of use to the GC. If the elements are accessed in a loop, it could be based on the loop state or such. Or maybe pattern of use can be established like FIFO or ring buffer, or other to help the GC manage the life of objects based on program progression. Not sure it's really feasible though.

Answer 80 · 2020-06-26T20:42:04.000Z

Original Author: @NodixBlockchain
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353489902
Original Date: Dec 21, 2017, 6:13 PM CST

Deep down i think the whole issue of automatic memory management is an issue of coupling between code as access pattern and memory management as knowing when memory can be reclaimed.

What make manual memory management ( when done properly) efficient is that it keep the memory use highly coupled with access pattern.

All common form of automatic GC tend to decouple entierely memory management from code and access pattern definition.

The rational behind object nursery remind me of using some monte carlo to compute radiosity based on a sample of light path to determine statistically the amount of light that will get in certain area without having to compute all possible light path. In the sense its based on empirical study of statstics on objects life time in most common code pattern, but still keep low coupling with code logic and access pattern.

Scope analysis keep the coupling with code higher, as its logic is coupled with code progression, but it doesnt solve everything. Especially if memory can be reclaimed inside a scope, like list processing, unless having code structured in recursion or something to have high coupling between scope and operation that can lead to memory reclaiming. Or it would need to organize code is such maner that scope can be used to capture only remaining live elements of a list or graph. Like some code that would look like recursion except it would scratch the stack frame at each iteration, and make the the next call with only the next elements left in the scope.

But still to me the best solution come with higher coupling between code as access pattern and memory management, to help the GC identify live objects not only as reachable, but also as possibly used or not, which in case of deterministic access should not be extremely difficult.

Even for graph, im sure access pattern could be made obvious to the GC. One system that come to my mind when thinking about declaring graph access pattern is something like xslt, even if its maybe not useable for that purpose as such.

Answer 81 · 2020-06-26T20:42:07.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-353704019
Original Date: Dec 22, 2017, 9:20 PM CST

This seems wrong, if the rate of object allocation is the same as the rate of object deallocation there will always be plenty of objects in the nursery.

Math. If N allocations/deallocations of total size m per each of the N in the collection period, then the cost is m instead of N * m. That's what I mean by amortized. The cost of (N - 1) * m is never defragmented.

Granted it requires more memory than a tightly/meticulously coded stack allocation paradigm. So Rust or something like it, will still be useful for mission critical embedded applications and such. But IMO meticulously coded (inflexible, significant tsuris) stack allocation not an optimum tradeoff for a general purpose programming language. And Go is lacking in other ways (some issues linked to upthread). I'm contemplating that I can improve significantly on both of them (and Java) with Lucid. Hope you get on board the train.

I am not sure I want to have syntax for indicating long lived objects, I think implementing an efficient state of the art generational GC could be good enough.

Then you'll have the reduced performance of the write-barrier (thus invalidating my math argument above) and the worst asymptotic complexity of mark-sweep.

Seems even after all this time, you're refusing to wrap your mind around my proposal. It's quite perplexing to me why you couldn't conceptualize what I wrote about it from the start upthread. I mentioned the write-barrier issue.

Answer 82 · 2020-06-26T20:42:09.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354127388
Original Date: Dec 27, 2017, 9:12 AM CST

@keean wrote:

A couple of key quotes:

As time goes, I consider more and more the GarbageCollector as something that turns freed pointer crash into memory leaks. And yes, I'm hunting Java memory leaks more frequently than C/C++ memory leaks, but I'm hunting C/C++ crashes due to freed references much more than NullPointerExceptions in Java.

But the problem you describe certainly has the same effect, so I guess I'll have to widen my definition slightly to "failure to free memory that will no longer be referenced by the running program". It's certainly true that, in the presence of globals or long-lived locals, the explicit act of freeing memory has to be replaced by the explicit act of nulling a reference.

So is GC actually a solution? With manual memory management we have to free references to heap storage (but not all references, because of RAII). With GC we have to null references that are no longer part of the running-set of the program (but not all references, because of simple local cases). Both of these lead to bugs, its just that with GC you might get away with it because the program is too short lived to run out of memory, but the delay makes it much harder to track down and debug.

Most references don't need to be nulled because they (local stack variables) go out of scope or they get replaced by the assignment of a reference to another object (the so called update operation in the GC book I cited upthread). Thus automated GC automates the memory management in those prevalent cases. That's the advantage of automated GC and the case for which it is a "solution" (with performance and memory utilization tradeoffs). My proposal eliminates most of the said tradeoffs, giving what I postulate is the optimum balance.

In the cases (which are less prevalent) that do need to be explicitly assigned null, these would need to be explicitly freed in any other memory management scheme, so we can say that automated GC is not worse in these less prevalent cases, has the advantage of not crashing for "use after free", and has significant automation advantage in the more prevalent cases. The "going out-of-scope" and update cases are what Rust would achieve with the tsuris+inflexibility (documented upthread) of annotating and tracking lifetimes (and exclusive mutable borrows) and thus does not encompass these cases that require explicitly freed memory.

Additionally it seems you're not accounting for the fact that nulling a reference is not equivalent to freeing an object. The automation of the runtime GC frees the programmer from needing to know at compile-time when the object is freed, regardless of whether references need to be nulled in some semantic cases.

JavaScript's closures can facilitate obfuscated/convoluted memory leaks. @keean had also noted this in these issues threads. This issue with closures can possibly be improved in another programming language design.

An example may make this clearer. Imagine I have a ring-buffer, which contains references to other data. I can 'clear' the ring-buffer efficiently by setting "start = end". With manual memory management, I leak all the data referred to in the buffer because I did not free the memory pointed to. With GC (RC or MS) I leak the memory because the references still exist in the ring buffer memory which is still in scope, even though I have cleared it by setting (start = end). The reference count in the references is not decremented by moving the ring-buffer start or end pointer, and the Sweeper can still find the references in the cleared ring buffer to follow them and mark the memory as still alive.

It doesn't matter whether there is GC or not, we have to know that when implementing the 'clear' method of the ring-buffer we must deal with making safe every reference in the buffer when moving the start or end pointer. So in both cases we need to understand the underlying memory and write code to safely handle references or we will leak memory.

Afaics, this is an irrelevant example for the standard definition of a circular buffer. The circular buffer has an empty operation which means there's nothing in the fixed size omnipresent buffer (which is supposed to store all instances that occupy the buffer), but the said buffer is not supposed to be freed until the last reference to the buffer instance is out-of-scope (or nulled) which will happen automatically and no explicit clear operation is required with automated GC.

However, what you're referring to though is a fixed-size buffer of references where the objects are not stored in-place in the buffer, where you want the buffer of references to remain allocated, but the object(s) referenced to be freed. Thus the references have to be nulled when they're no longer between the start and end indices.

I try and implement a fast-clear by setting the read-index equal to the write-index, and again I have just leaked all the objects that are referenced by the circular-buffer.

Yet this is a semantic error in the design of your circular buffer API (given that your buffer is a buffer of references, not a buffer of objects in place) and thus afaics has nothing to with the salient issues of choosing between automated GC or compile-time stack allocation.

you can still leak memory by forgetting to delete objects from the container

This has nothing to do with the salient issues of choosing automated GC or stack allocation. Yeah there are semantic memory leaks. But it's irrelevant to this issues we were discussing.

it is more the APIs provided by JS do not allow memory re-use, which leads to fragmentation and worse cache performance.

Indeed. These are higher-level semantic design issues which are (mostly if not entirely) orthogonal to the salient issues of choosing automated GC or stack allocation.

I think the problem is when you have any collection of objects you need to manage that collection yourself. For example you have an array of employee records, you have to delete the records when you have finished with them. Because the employee records are in an array, the garbage collector does not help you manage them. The GC only knows if the whole array is referenced or not. Real applications rarely have simple flat data-structures, consider a CRM database.

So the problem comes when I have long-lived data structures, where the contents of the structure(s) have a shorter lifetime than the structure itself.

Well yes you have to delete the employee records from the array, because your array is still referencing them. You're discussing semantic memory leaks, which is an unrelated issue. Every form of memory allocation strategy enables semantic memory leaks. JavaScript's WeakMap can automatically free objects which are only referenced by the WeakMap, because the keys are not enumerable thus the only way to access an object is to have a reference to its key object. There's no plausible analog for arrays, because the array elements are accessed by indices (not referenced keys).

Reference counting with immutable objects is an interesting point in the design space, as you cannot create circular references with immutable objects, so reference counting becomes a complete solution.

RC has the same reference problem as MS, but it also has the cycle-memory leak problem.

Not entirely correct...

The BG-RC automated GC design employed newer methods for probing and eliminating circular referencing for the referenced counted objects, so as to not leak these. You can re-read the research paper I cited upthread if you've forgotten. Essentially they only probe when reference counts are decremented. It's apparently not necessary to force the inflexibility of a certain programming paradigm (e.g. immutability everywhere) on the programmer. I think the probing algorithm is a source of some less ideal asymptotic complexity, which the research paper noted could be future work for improvement, but nevertheless even mark-sweep has poor asymptotic complexity (so comparing apples-to-apples of automated GC, then RC with probing is apparently no worse and has some better tradeoffs in other facets). There might be some additional innovations that can be made such as marking which objects are not candidates for circular references and thus don't need to be probed. Essentially my proposal requires the programmer to apply a minimal amount of manual insight into the memory allocation instead of the less performant fully automated GC-MS (generational + mark-sweep), yet avoiding the extreme tsuris/inflexibility of Rust's compile-time stack allocation.

Answer 83 · 2020-06-26T20:42:13.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354279398
Original Date: Dec 28, 2017, 6:14 AM CST

Additionally it seems you're not accounting for the fact that nulling a reference is not equivalent to freeing an object.

Its exactly the same, if you don't free an object you leak the memory, if you do not null a reference (that is any root reference, or any reference reachable from a root) you leak the memory. Its exactly the same, you leak the memory if you fail to do it.

Yet this is a semantic error in the design of your circular buffer API (given that your buffer is a buffer of references, not a buffer of objects in place) and thus afaics has nothing to with the salient issues of choosing between automated GC or compile-time stack allocation.

This has everything to do with it. The whole point of GC is to prevent memory leaking due to semantic errors. If the programmer never makes semantic errors, then they would always program manual allocation correctly. In effect what this does is undermine the ease of programming/semantic argument for GC. GC in fact makes several classes of bug harder to find and more difficult to debug. As debugging is harder than writing code we want to optimise for ease of debugging, not ease of writing. This is strike one for GC.

Then we have the memory usage issue. If we are not going to require a new operating system then we need to abide by the requirements of current OSs:

However, use of the virtual address space is strongly discouraged in the Linux kernel. This is particularly true on 32-bit architectures where the virtual address space is limited to 100M by default. Using the virtual address space on 64-bit Linux kernels is also discouraged but the address space is so much larger than physical memory it is less of an issue.

So the OS wants us to minimise virtual address usage, even on 64bit systems. With GC we cannot have both high performance and minimal memory usage. We need about 50% additional memory to get performance, so this is strike two for GC.

Answer 84 · 2020-06-26T20:42:16.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354279685
Original Date: Dec 28, 2017, 6:16 AM CST

The BG-RC automated GC design employed newer methods for probing and eliminating circular referencing for the referenced counted objects

Well of course you can detect the cycles, but this costs performance. My point was that immutable objects (value semantics) do not produce cycles, so you can use simple RC without worrying about it.

Answer 85 · 2020-06-26T20:42:18.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354390714
Original Date: Dec 28, 2017, 8:54 PM CST

Additionally it seems you're not accounting for the fact that nulling a reference is not equivalent to freeing an object.

Its exactly the same, if you don't free an object you leak the memory, if you do not null a reference (that is any root reference, or any reference reachable from a root) you leak the memory. Its exactly the same, you leak the memory if you fail to do it.

Incorrect from a programmer effort cost perspective. You're ignoring the other part of what I wrote quoted as follows:

@shelby3 wrote:

The automation of the runtime GC frees the programmer from needing to know at compile-time when the object is freed, regardless of whether references need to be nulled in some semantic cases.

@keean wrote:

The whole point of GC is to prevent memory leaking due to semantic errors.

Incorrect/disagree. The point of automated GC is to eliminate meticulous compile-time tracking of lifetimes. And my proposal attempts to lift the performance up to nearly par by eliminating the write-barrier.

If the programmer never makes semantic errors, then they would always program manual allocation correctly.

Incorrect conceptualization. Semantic errors will be semantic errors in any memory allocation management scheme. I'm repeating myself.

So the OS wants us to minimise virtual address usage, even on 64bit systems. With GC we cannot have both high performance and minimal memory usage. We need about 50% additional memory to get performance, so this is strike two for GC.

I posit that my proposal will be better in these facets than BG-MS and BG-RC. The fragmentation will be less because all the non-RC references (which should be most objects, because of the strong law that most objects die young) will die in the nursery and never create fragmentation that has to be offset by the large virtual memory space. Also bump pointer allocation can be used within the heap allocator to mitigate fragmentation significantly for the RC referenced objects (this was mentioned/cited in more detail upthread). The performance of my proposal should be closer to Rust's compile-time allocation lifetimes (and thus also with more algorithmic flexibility), because of the elimination of the write-barrier on update of references. The use of RC for long-lived objects will have better asymptotic complexity w.r.t. to percentage of the physical heap allocated and total heap size allocated as compared to MS (mark-sweep). The remaining asymptotic weakness is the probing for circular reference leaks, but I also mentioned a possible algorithmic improvement (and it's no worse asymptotically and better is some facets than mark-sweep as is, although it might possibly leak some circular references in corner cases, I'm not sure).

But you'll have fragmentation issues with Rust also. It's disingenuous to imply (by your allegation against alternatives) that programming in Rust is some panacea without fragmentation and other issues.

Indeed you're correct that compile-time (especially all on the stack) allocation can be more efficient (but I posit this difference will be significantly less with my proposal), this is simply not a good tradeoff in terms of programmer productivity for a general purpose mainstream programming language. The algorithmic inflexibility can be quite onerous also. For most use cases, it is over-engineering and too slow to code and complex to maintain. Rust will probably still have market for those use cases where it matters such as embedded and mission critical applications (such as perhaps the infrastructure for other apps such as OSes and web browsers). Profiling can be used to determine which small percentage of the code needs to be highly, hand-tuned, meticulously optimized. I had already addressed this quoted as follows:

@shelby3 wrote:

Granted it requires more memory than a tightly/meticulously coded stack allocation paradigm. So Rust or something like it, will still be useful for mission critical embedded applications and such. But IMO meticulously coded (inflexible, significant tsuris) stack allocation not an optimum tradeoff for a general purpose programming language.

Linux on 64-bit will accommodate virtual address space usage. If the mainstream programming language demands it, then Linux will be further improved along those lines. OSes are not static things which do not adjust to the overriding trends. And I've also pointed out that fragmentation will be less with my proposal. There's no panacea. We aim for the sweet spot for a general purpose mainstream programming language.

@keean wrote:

The BG-RC automated GC design employed newer methods for probing and eliminating circular referencing for the referenced counted objects

Well of course you can detect the cycles, but this costs performance. My point was that immutable objects (value semantics) do not produce cycles, so you can use simple RC without worrying about it.

You're repeating what I already admitted/wrote. But I also wrote quoted as follows:

@shelby3 wrote:

It's apparently not necessary to force the inflexibility of a certain programming paradigm (e.g. immutability everywhere) on the programmer. I think the probing algorithm is a source of some less ideal asymptotic complexity, which the research paper noted could be future work for improvement, but nevertheless even mark-sweep has poor asymptotic complexity (so comparing apples-to-apples of automated GC, then RC with probing is apparently no worse and has some better tradeoffs in other facets). There might be some additional innovations that can be made such as marking which objects are not candidates for circular references and thus don't need to be probed.

The point is that forcing immutability programming paradigm everywhere is very inflexible. Programmers want to have freedom to use different programming paradigms. Also immutable data structures have a log n performance cost. They're not a panacea.

Answer 86 · 2020-06-26T20:42:20.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354398794
Original Date: Dec 28, 2017, 11:09 PM CST

The automation of the runtime GC frees the programmer from needing to know at compile-time when the object is freed, regardless of whether references need to be nulled in some semantic cases.

This is wrong. If you don't null it, you will leak the memory at runtime.

You seem to think the consequences of not freeing an object are somehow worse? They are not. You don't have to free any object, when a program terminates all its memory is returned to the system, whether the object was freed or not. In 'C' you for not really need to free anything, the only cost is the program will use more memory, just like if you forget to null a reference with GC. If you have enough RAM you don't have to free any memory at all.

Answer 87 · 2020-06-26T20:42:23.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354400076
Original Date: Dec 28, 2017, 11:30 PM CST

Incorrect from a programmer effort cost perspective. You're ignoring the other part of what I wrote quoted as follows:

@shelby3 wrote:

The automation of the runtime GC frees the programmer from needing to know at compile-time when the object is freed, regardless of whether references need to be nulled in some semantic cases.

This is wrong. If you don't null it, you will leak the memory at runtime.

No you're wrong and you still do not grasp the point I'm making.

With automated GC, nulling a reference does not necessarily free the object, because there might still be other references to the same object existing. With automated GC, the programmer doesn't have to know whether those other references exist or not. Whereas, where you compare this to the freeing the object by calling the free() in C or equivalent in other languages, I'm pointing out that to free the object, the programmer must know if there are any other such existing references.

The point was that both manual (i.e. compile-time) and automatic (i.e. runtime) memory allocation management require the programmer to take actions for semantic cases that need to be dereferenced (e.g. the circular buffer of references example you provided). But the automated GC has the advantage that the programmer doesn't have to know the lifetimes of all the references to the same object at compile-time. Thus, per some examples I showed upthread for Rust, the algorithmic flexibility can be greater and less meticulous with automated GC as compared to Rust's compile-time lifetimes.

Answer 88 · 2020-06-26T20:42:26.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354403743
Original Date: Dec 29, 2017, 12:19 AM CST

I'm pointing out that to free the object, the programmer must know if there are any other such existing references.

And I am pointing out you do not care about other references if you never call free. Just leak the memory. If you don't null the reference you leak the memory, so it is no better than never calling free, so just don't call it.

In C++ you just use a "shared_pointer" that implements RC for cases where the semantics of freeing are complex.

Most variables in a program have simple local scope. Very few variables actually have complex multiple references with different lifetimes. In a way GC is optimising for the uncommon case.

Whatever the solution it has to make debugging memory leaks as easy as possible, as well as dealing with the common simple local variable efficiently.

Answer 89 · 2020-06-26T20:42:28.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354521258
Original Date: Dec 29, 2017, 7:55 PM CST

@keean wrote:

And I am pointing out you do not care about other references if you never call free. Just leak the memory.

Who wants to design for the cases of promoting memory leaks (which are not semantic in nature or let's say can be dealt with automatically by RC or MS)?

If you don't null the reference you leak the memory, so it is no better than never calling free, so just don't call it.

No better in the case that the programmer desires memory leaks. That makes absolutely no sense.

I explained in the case of not desiring memory leaks or "use after free" then automated GC automates the tracking of the lifetime of an object without complex and inflexible compile-time tsuris.

In C++ you just use a "shared_pointer" that implements RC for cases where the semantics of freeing are complex.

Yup. And I'm proposing the same thing, but merging it with a nursery that thus doesn't need a write-barrier and thus is I posit nearly as performant as the tsuris and algorithmic inflexiblity of Rust's lifetimes with exclusive mutable borrows, with much greater programmer productivity. I can believe I am having to write this for the 10th time and you still haven't recognized the potential advantage. It's absolutely exasperating.

Most variables in a program have simple local scope. Very few variables actually have complex multiple references with different lifetimes. In a way GC is optimising for the uncommon case.

Whether that is true or not and you don't know what the percentage is, the fact is that the other paradigms are algorithmically inflexible. I cited some examples upthread for Rust where the lifetime checker can't detect what is safe code. And the annotations and hassles to prove to the compiler which references are simple RAII and which are more complex but still within the lifetime paradigm, is a significant drain on programmer productivity, maintenance costs (code readability/simplicity), algorithmic flexibility, and complexity of the type system and compiler.

Whatever the solution it has to make debugging memory leaks as easy as possible,

What golden rule is that? And where is the proof? All programming language design decisions are thrown in a mix of tradeoffs.

as well as dealing with the common simple local variable efficiently.

Which my proposal apparently will do, as well as more efficient in many other ways such as programmer productivity.

You're a person who wants to overengineer things. You'll never create a friendly, light, fun, mainstream language adopted by millions of programmers. I'll put that out there as a challenge to both of us, and let's see which us succeeds in doing so.

Answer 90 · 2020-06-26T20:42:31.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354532259
Original Date: Dec 30, 2017, 1:30 AM CST

Who wants to design for the cases of promoting memory leaks (which are not semantic in nature or let's say can be dealt with automatically by RC or MS)?

My feeling is this is what GC encourages, especially when combined with closures. Most programs written in scripting languages leak memory, but it's okay because the programs are not generally long lived. If you want stable memory use and GC you might need to get rid of closures.

No better in the case that the programmer desires memory leaks. That makes absolutely no sense.

No better in the worst case makes a lot of sense from a design perspective. Many algorithms are chosen based on the limit of their worst case performance.

"use after free"

Not freeing memory entirely eliminates use after free errors, and is faster than freeing. For short lived programs it is an optimal strategy. This is a the simplest case of a region allocator (a single region). Just let the program use the memory it needs and let the OS clean up at the end. Many mathematical problems fit into this class.

Yup. And I'm proposing the same thing, but merging it with a nursery that thus doesn't need a write-barrier

If you remove longer lived objects from the nursery it is no longer so efficient, as you now have to explicitly defragment it, rather than compacting when copying objects into the next generation. I think you are missing some details by over simplifying your mental model of what is going on. Actually write the code for this, and you will find its more complex than you think.

What golden rule is that? And where is the proof? All > programming language design decisions are thrown in a mix of tradeoffs.

Debugging is clearly harder than writing a program in the first place. I can write an incorrect program in no time at all. Writing a correct program is hard. The proof is to start measuring the time you spend writing Vs debugging and maintaining code. It's something in think a team leader , project manager, or anyone that has had to estimate the development time for writing code should realise.

I cited some examples upthread for Rust where the lifetime checker can't detect what is safe code.

Proving a variable is local is very different from knowing it is local. The majority of variables can be local even if you can't prove they are. This one should be simple, if we do not allow references, then we have to use global variables to maintain state between functions. Simply find a language that does not provide references and count the number of local Vs global variables across several different programs.

Actually this is a useful thought - maybe a simple memory management solution that removes the need for GC is simply don't allow references. Force the direct use of variables in an outer scope. This is basically the solution all databases use (as in a database a 'reference' is actually an index).

I can believe I am having to write this for the 10th time and you still haven't recognized the potential advantage. It's absolutely exasperating

It clearly has some advantages, it's more of an incremental improvement than a ground breaking paradigm shift though.

programmer productivity.

In my experience, programmer productivity is dominated by debugging time, except in rare instances where complex algorithmic research is required. Most programmers do not spend any time doing this, and it is largely restricted to university research papers.

You're a person who wants to overengineer things.

Maybe, but I also am looking for something that is different enough to drive adoption. Being a better JavaScript is unlikely to get massive usage (look at all the languages out there with low adoption rates).

For amateur programmers you really want something like a spreadsheet. Something like Apple's "HyperCard", and for that there is no question that GC is the correct approach.

I am a professional programmer, so naturally I am interested in a language for professionals, because I want to solve real problems that I have with existing languages.

Answer 91 · 2020-06-26T20:42:34.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354559246
Original Date: Dec 30, 2017, 11:52 AM CST

@miguel raised an issue with me in private discussions about heterogeneous collections w.r.t. to my proposal. He was thinking that the reference counting type was on the class (whereas, it's on the reference). But there's a related issue which is that the programmer may wish to add to a collection, objects which are both reference counted and not, because it may be more efficient than promoting all non-RC (i.e. nursery) objects out of the nursery to RC just to be able to put them in some collection mixed with RC objects. Otherwise it's an analogous issue to "what color is your function". To accommodate that case, we could have a third type of reference which doesn't know at compile-time whether it points to a RC or non-RC object. Thus, it reinstates the write-barrier. Sometimes the write-barrier would be more efficient than promoting all nursery objects to RC for a case where the two types have to be intermixed.

Note that in my proposal, RC references can never be assigned to non-RC (i.e. nursery) reference types. GC with write-barriers don't have this limitation, because the references all have the same type at compile-time. But this performance cost is paid every where. Need to ponder whether this "what color is your function" applied to references, is a significant source of inflexibility?

Tangentially, there may be scenarios where the RC cost can be skipped down the call stack when the call stack lineage holds a reference to the object. My idea for not having out-of-order callbacks in single-threaded code makes it easier to reason about which of the said RC increments/decrements can be avoided. The point being that we can perhaps avoid most of the cost of RC in some cases.

Answer 92 · 2020-06-26T20:42:36.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354561173
Original Date: Dec 30, 2017, 12:29 PM CST

@keean wrote:

My feeling is this is what GC encourages, especially when combined with closures.

I already mentioned the closures issue upthread and that it would need to be investigated thoroughly to see what could be done.

I don't think GC encourages the type of semantic errors such as your circular buffer example. Thus, I think you're exaggerating. I'm open-minded, if you have hard statistical data indicating otherwise.

No better in the worst case makes a lot of sense from a design perspective. Many algorithms are chosen based on the limit of their worst case performance.

Your application of that truth to this case, doesn't make any sense. In the worst case, the two memory allocation strategies can both leak, i.e. they're equivalent in the worst case. In the best case, automated GC automatically tracks which objects are freed and the programmer need only null the reference to indicate it is no longer in use. Whereas, some manual memory allocation scheme has to prove that the lifetime of the object is out-of-scope and can be released. This is the Nth time I have stated this.

I have already stated numerous times the positive and negatives (the tradeoffs) for automated GC and how my proposal posits to improve on them.

Not freeing memory entirely eliminates use after free errors, and is faster than freeing. For short lived programs it is an optimal strategy. This is a the simplest case of a region allocator (a single region). Just let the program use the memory it needs and let the OS clean up at the end. Many mathematical problems fit into this class.

You're still trying to justify your logic above. It remains the case that in the worst case they're equivalent. And thus your entire logic from the start on this point fails as explained above.

If you remove longer lived objects from the nursery it is no longer so efficient, as you now have to explicitly defragment it, rather than compacting when copying objects into the next generation.

Absolutely incorrect. I'm not going to re-explain it for the Nth time.

I think you are missing some details by over simplifying your mental model of what is going on. Actually write the code for this, and you will find its more complex than you think.

I read the (first several chapters of the) book on GC recently. You're the one who is apparently lacking understanding presumably because you haven't read it. It's disingenuous to accuse someone of missing details, when you're not stating what those details are specifically.

Debugging is clearly harder than writing a program in the first place.

Your implied claim is that Rust's compile-time provably correct lifetimes and exclusive mutability will never need to be debugged. But this is an extremely myopic point to make, because Rust has to punt to RC and unsafe code in certain cases, and even in provably safe code, the semantic memory leak errors can still occur.

You have failed to prove that my proposal is harder to debug or even that the amount of difficulty isn't offset by the increased productivity, higher readability of the code, lower maintenance, etc.. You're just throwing out subjective opinions without any statistical studies or substantiation.

I'm one of the most prolific debuggers on every major project I've worked on. Please don't try to insinuate I know nothing about debugging. Also I would not agree that debugging is always more difficult than the writing the program. A well written program involves a lot of forethought. Carefully written, well organized code, with forethought as to sources of potential bugs can alleviate many types of "action at a distance" type of bugs that (appear to be random and not reproducible and thus) are extremely difficult to track down.

You're relating your experience as a code reviewer for a large team and trying to keep a large team all on the same page (which is a well known Mythical Man Month dilemma). So you would like to force them all into a compile-time enforced structure which insures more correctness.

We already had the abstract (relativistic physics) debate in the past wherein I explained/argued to you that it's impossible to statically prove that all sources of bugs don't exist. The more static checking that is piled on, the less flexible the programming language becomes. You're infatuation with gobs upon gobs of complex type system cruft, is not going to end up being the panacea you dream of. The problem is as Rust's lifetimes model exemplifies (see the examples upthread of safe code which Rust can't prove is safe), that it's not possible to prove every possible kind of program is safe. The variants of typing required are an exponential explosion.

Actually this is a useful thought - maybe a simple memory management solution that removes the need for GC is simply don't allow references.

When you have a real proposal, I will pay attention. Otherwise I'll just ignore this as I did the suggestion that immutable everywhere is performance solution for the write-barrier when it's known to introduce a log n algorithmic performance degradation.

It clearly has some advantages, it's more of an incremental improvement than a ground breaking paradigm shift though.

I don't yet know if it is a good idea. There's some disadvantages, per the prior post. But I don't acquiesce to your egotistical slant. The design potentially interacts with my other ideas about single-threaded model for concurrency and avoiding Rust's need to prove exclusive mutability. Taken holistically, it might be a significant ground breaking language shift. We'll see...

Note the comment of mine to which you were replying wasn't asking for you to agree the proposal is great, but an exasperation that you hadn't yet grasped the posited advantages and tradeoffs completely as evident by some of your comments. I wasn't looking for a slap on the back (although I apparently received a slap down instead), rather just pushing for mutual coherence in our discussions. The level of noise is discouraging.

In my experience, programmer productivity is dominated by debugging time, except in rare instances where complex algorithmic research is required.

Thus your programmers never read and maintain any other programmer's code.

Also a facet of debugging is being able to understand the code you're debugging.

Meticulous cruft static type system complexity (as well too much complex type inference) can be argued to be difficult to conceptualize and read.

Being a better JavaScript is unlikely to get massive usage (look at all the languages out there with low adoption rates).

TypeScript has had the fastest growing adoption of a new language in recent memory.

Python is very popular because it's very easy to express algorithms with. Although I don't think anyone should use Python for a major project with a million lines of code and a large team. Seems to me your argument for greater compiler checking is because you're prioritizing large projects with large teams?

But I'll state with high confidence that most programs will end up being smaller. That the programming world is moving away from large Mythical Man Month morass clusterfucks (even those supported by socialism for example and allowing much slower rates of progress than the free market) and towards small teams, apps, and interoperability standards.

You're insinuating that PureScript, CoffeeScript, etc are better than JavaScript. I disagree. They all suck. Why the heck would I want to use any of those?? I don't want Haskell's abstruseness. I don't want some half-baked syntax sugaring.

For amateur programmers you really want something like a spreadsheet. Something like Apple's "HyperCard", and for that there is no question that GC is the correct approach.

Who is targeting idiots? Not me.

Vitalik Buterin and Eric S. Raymond both prefer/rave about Python. Both of them are probably 150+ IQ.

I am a professional programmer, so naturally I am interested in a language for professionals, because I want to solve real problems that I have with existing languages.

Which real problems? Self-documenting code via complete type annotations? Seems to me you want a type system to prevent team members from violating any and all invariants? But I already explained that's an impossible pipe dream. So which real problems?

The entire point of my proposal is trying to holistically figure out how to get low-level coding and high-level coding into the same language and to deal with the concurrency issues without needing to prove and restrict to exclusive mutable borrows every where. I'm basically looking at the weaknesses of JavaScript (and TypeScript) where it doesn't (they don't) meet my needs. And I've looked at C++, Rust, Java, and Scala and they also have other issues.

Note I do want to spend some time asap thinking about which language I am going to use interim until Lucid might become a reality.

I know typeclasses has been a top priority of yours. I've never seen your holistic rationale on which memory allocation strategy you want. Seems you were checking out Rust, Ada, and contemplating other designs such as regions and I hadn't yet seen your conclusive remarks.

Answer 93 · 2020-06-26T20:42:39.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354568064
Original Date: Dec 30, 2017, 2:59 PM CST

You're still trying to justify your logic above. It remains the case that in the worst case they're equivalent. And thus your entire logic from the start on this point fails as explained above.

No in the worst case GC fails because it makes memory leaks much harder to debug. It is very difficult to track down at runtime exactly where the leaks are coming from. Whereas tools like "valgrind" make it relatively straightforward for manual/deterministic memory management.

Absolutely incorrect. I'm not going to re-explain it for the Nth time.

You have not explained this once. Point me to the algorithm or to a paper which explains this if you cannot explain it well enough yourself.

Your implied claim is that Rust's compile-time provably correct lifetimes and exclusive mutability will never need to be debugged.

You misunderstand me, I am not saying anything about Rust. All I am saying is that it is easier to debug deterministic and predictable systems, where the programmer has visibility and control of what is going on.

TypeScript has had the fastest growing adoption of a new language in recent memory.

I am using Typescript

Python is very popular because it's very easy to express algorithms with.

I am using Python too. You can get pretty good performance from python using numpy to avoid explicit iteration, operating on vectors and matrixes of numbers. Still for the heavy lifting you have to call out to 'C' using python modules.

I like Python, I don't need another Python.

So which real problems?

Writing a compiler that is modular and understandable. Writing clean parsers. Monte-Carlo simulations, and AI.

I know typeclasses has been a top priority of yours. I've never seen your holistic rationale on which memory allocation strategy you want. Seems you were checking out Rust, Ada, and contemplating other designs such as regions and I hadn't yet seen your conclusive remarks.

I don't really have any conclusive remarks. What I can say is that GC is a good baseline, but its clear that for real performance manual management like 'C' is needed. This would be the Python approach, high level in python, low level in C.

As for typeclasses, I could go for some kind of modules instead. The important things it that it should be based on Category Theory, and avoid lots of boilerplate.

Answer 94 · 2020-06-26T20:42:42.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354588337
Original Date: Dec 31, 2017, 12:41 AM CST

@keean wrote:

You have not explained this once.

It's all upthread, numerous times. Does require though assimilating cited references and my comments. I'm not going to repeat myself again at this time for the Nth time.

No in the worst case GC fails because it makes memory leaks much harder to debug.

Now you're moving the goal posts. You had just argued that worst case comparison was about allowing them both to leak because it doesn't matter for short-lived programs. Your logic is discombobulated. Now you're using the phrase "in the worst case" in a point about avoiding worst case by debugging memory leaks. Thus the sentence is illogical.

What you really are doing is trying to ignore that mistake you made in logic and change the discussion to about the relative debugging costs of different memory allocation strategies. This new direction is an interesting area to discuss, but it doesn't absolve you from your earlier mistake, which you apparently refuse to acknowledge. Any way, I'm not trying to rub your face in it, but please also stop implying I was wrong, when in fact your mistake remains. Clueless readers can be easily mislead by your attempts at obfuscating the discussion.

It is very difficult to track down at runtime exactly where the leaks are coming from. Whereas tools like "valgrind" make it relatively straightforward for manual/deterministic memory management.

Afaics, the memory leak detection is valgrind is reporting which allocations were never freed when the program shutdown. A similar tool can be made for a GC. Thus I think your assertion is bogus.

All I am saying is that it is easier to debug deterministic and predictable systems, where the programmer has visibility and control of what is going on.

As you see above, the devil is in the details of such claims. I challenge you to prove that C++ or Rust is easier to debug than Java. A Java expert will likely be able to point to tools they use to aid debugging.

I will repeat again, the determinism in Rust's compile-time lifetimes and exclusive mutable borrows have nothing to do with (or at least not all types of) semantic memory errors. I fail to see any determinism or predictability in the occurrence of such errors in any of those systems. Rust models a class of errors at compile-time but it doesn't model every form of (or even memory allocation) error, e.g. where Rust must punt to RC references. And the cost of that limited amount of compile-time error checking is a reduction in algorithmic flexibility, increase in verbosity, complication/cruft of the type system, etc.. Probably there are use cases in which it is a desired tradeoff for that mission critical use case to restrict to only safe (i.e. absolutely no instances of unsafe) Rust code (with no RC references), so that the memory allocation is provably leak-free at compile-time (which also requires an onerous exclusive mutable borrowing paradigm as well, i.e. the lifetimes are inseparable from the mutability restrictions in Rust). But I don't think those use cases are the next mainstream programming language.

Writing a compiler that is modular and understandable. Writing clean parsers. Monte-Carlo simulations, and AI.

I fail to see how that has anything to do with breaking new ground for a mainstream programming language, other than as a research tool. I'm addressing real issues I have where I can't write all the code (client, server, high and low-level) in one programming language. And where I can avoid the tsuris of Rust or C++. And where I can avoid Java's verbosity, GC pauses, and lack of for example a native unsigned type. And where I don't have to write noisy code with braces and semicolons. And where I can later perhaps also get typeclasses. And where I can get some of TypeScript's typing features such as union types and flow-based typing (e.g. TypeScript's typeof and instanceof guards), which leads to more elegant code.

Writing a clean compiler is an admirable goal, but it is not germane to the issue of which real mainstream programming problems are we addressing. And no I am not trying to write a compiler which programmers can program. In past comments, seems you're focused on trying to build a stack of compiler layers starting from Prolog unification and up. That is a massive goal which will require years and involve many false starts and restarts.

Your apparent aspirations towards a compiler framework starting from correct first principles, is an interesting research goal to explore. Perhaps something for me to dabble in when I am retired. In the meantime, I need to focus on near-term deadlines and needs. I'm also remembering @skaller who is experimenting with his "kitchen sink" exploration of the programming language design space with his Felix programming language research. He's apparently retired after having worked on the C++ committee in the past.

I'm trying to solve some (few or several) major design goals in the most efficient way possible. Not to fool myself into believing I can solve the "kitchen sink" of the vast programming language design space in any reasonable time frame.

but its clear that for real performance manual management like 'C' is needed. This would be the Python approach, high level in python, low level in C.

Precisely the segregation and tradeoff my proposal wants to eliminate. Yet you claim it's not potentially ground breaking. For example, eliminating marshalling and algorithmic inflexibility costs of the FFI.

As for typeclasses, I could go for some kind of modules instead.

I remember that discussion we had in these issues threads. And I'm also not sure which I want. I decided to focus first on my more immediate needs and then figure that out later when I have more recent coding to analyse and guide my perspective. Most of my coding was done more than a decade ago, and I for example didn't know about the concept of typeclasses then, so I need to have these concepts in mind as I code forward in order to formulate a reasoned perspective.

We had some very interesting discussions in the past.

It's possible that my proposal is lacking in some major way. Yet I'm not going to acquiesce to what I perceive to be incorrect statements. The main problem I see thus far with my proposal is that RC and non-RC references can't interopt freely, i.e. there's a reduction in degrees-of-freedom in exchange for the benefits the proposal brings.

Answer 95 · 2020-06-26T20:42:47.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354592080
Original Date: Dec 31, 2017, 2:36 AM CST

I am not sure what mistake you think I have made?

For server stuff, Go where you need performance, otherwise Python is good enough. At the moment I am using Typescript client side and Python on the server, and it is fine. I don't really need a replacement for those, I don't need manual memory management because I am not dealing with performance issues. Where I come across problems is in C/C++/Ada/Rust when I am concerned about performance, and I find I cannot use abstraction and have a good architecture without losing performance.

The only time I have seriously hit the limits of JavaScript/TypeScript abstraction is when trying to write a Prolog interpreter or any language compiler in them.

From a support point of view, Java has been the worst for support, as the code quality was poor, and it's use of memory (per user) was terrible. The service would chew through memory crashing every couple of days,. Whereas in comparison our Python services run for months without intervention. The conclusion is that Java threads use a large amount of per-thread memory to support N concurrent users, and the memory leaks over time.

The two solutions that seem to work well are Pyhton (which reference counts, and uses MS for freeing cycles only) and JavaScript/Go like concurrency. The former works even though it has threads because it is largely deterministic about memory due to RC, the latter works because it uses asynchronous concurrency with MS GC.

So this little analysis suggests that for web-server you do not want a threaded model and mark-sweep (generational) GC together for this kind of application.

Answer 96 · 2020-06-26T20:42:50.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354604837
Original Date: Dec 31, 2017, 7:49 AM CST

There's been at least two major themes I'm trying to improve upon for programming language design other than the issues around genericity/reuse/typeclasses/modules/HKT/HRT (i.e. higher-order type systems on the Lambda cube):

concurrency/parallelism
avoidance of a FFI for integrating low-level coding with the higher-level conveniences.

I had explained (in another thread, but I think it's linked from the OP) that asynchronous callbacks are essentially the same as multithreaded re-entrancy in terms of exposing the possibility for race conditions on mutable data (although callbacks at least delineate the race conditions). Rust prevents such race conditions by requiring (at compile-time) exclusive borrowing of mutable references every where, which is quite onerous. Go (and JavaScript without callbacks) solves this problem by not allowing shared memory between channels. The Go CSP channels (or separate JavaScript threads) can (or should) only share data by passing messages. This prevents the optimal multi-threaded access of a large shared immutable data structure such as the UTXO of a blockchain (which can be GBs). We had also discussed that lockless design is far superior because the bugs with synchronization are usually unbounded, thus we're frowning on idiomatic Java as a concurrency solution.

My proposal of explicitly declaring which references are RC coupled with the proposed design decision to give each thread it's own non-shared nursery and allow RC references to be shared between threads (and coupled with my proposal to not allow out-of-order asynchronous callbacks¹), clearly demarcates the RC references that can lead to mutable data race conditions. Thus the programmer only needs to insure the RC references are to immutable data structures where necessary to insure no data races and/or to queue modifications (including the increments and decrements to the reference counts) that are order independent so they can be batch processed by a single-thread to avoid data races involved with multi-threaded update of data.

Giving each nursery it's own non-shared thread is also beneficial in that for low-level code (each instance running in said single-threads) can do low-level pointer manipulations without references being moved, because the nursery copy compacter won't run until that thread is stalled (and presumbly not stall a function which is marked as low-level). So this eliminates a marshalling and FFI issue required for typical GC languages to interact with low-level C-like coding. Additionally the RC references never move (in virtual address space).

I hope with this explanation, readers see there's some holistic design considerations motivating my proposal.

Obviously the fault in my proposal is that mixing nursery and RC referenced objects in a collection could be woesome. Even we now note the added complication that we would be combining references that are single-threaded access with those which have multi-threaded access.

@shelby3 wrote:

Tangentially, there may be scenarios where the RC cost can be skipped down the call stack when the call stack lineage holds a reference to the object. My idea for not having out-of-order callbacks in single-threaded code makes it easier to reason about which of the said RC increments/decrements can be avoided. The point being that we can perhaps avoid most of the cost of RC in some cases.

¹ _{This requires further analysis and discussion. And especially how we will handle UI code. Functional reactive programming? I'm referring to not having any shared mutable state modified while waiting for any callback. With that rule, then exclusivity of mutability of the the single-threaded access to nursery objects is guaranteed.}

I am not sure what mistake you think I have made?

Can we stop beating a dead horse? ^{_{IMO, it's clearly articulated in the thread for anyone with sufficient reading comprehension skills.}} I'm more interested in any insights you have regarding memory allocation strategies than rehashing what I think your error in conceptualization was.

Answer 97 · 2020-06-26T20:42:53.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354637004
Original Date: Dec 31, 2017, 11:11 PM CST

This discussion is timely because I'm needing to make a decision pronto on which programming language to start coding our blockchain. The choice of programming language may also impact hiring decisions as well, which I need to make pronto.

@keean wrote:

From a support point of view, Java has been the worst for support, as the code quality was poor, and it's use of memory (per user) was terrible. The service would chew through memory crashing every couple of days,. Whereas in comparison our Python services run for months without intervention. The conclusion is that Java threads use a large amount of per-thread memory to support N concurrent users, and the memory leaks over time.

The memory leaks were due to errors of your programmers, not due to Java's GC? Or that the MS of the GC could not keep up with the rate of allocation of objects which outlived the generational nursery, thus causing huge "stop the world" pauses to recover the memory? I gather that you're blaming Java's verbosity (e.g. no functions, everything must be a method) and use of synchronization instead of lockless design (which tends to be buggy and thus probably leaky) for promoting errors in coding?

The two solutions that seem to work well are Python (which reference counts, and uses MS for freeing cycles only) and JavaScript/Go like concurrency. The former works even though it has threads because it is largely deterministic about memory due to RC, the latter works because it uses asynchronous concurrency with MS GC.

MS has very bad asymptotic complexity. RC has better asymptotic complexity as long as there's some low asymptotic complexity mechanism to free cyclical reference leaks (presuming these leaks are less common then the MS or probing mechanism can run less frequently than a pure MS without RC).

My proposal recognizes that RC has very low cost when employed only for long-lived objects, and nursery without a writer-barrier has nearly 0 cost for the short-lived objects. So in theory my proposal can combine the performance of Rust with the convenience of automatic GC and with excellent asymptotic complexity.

I'm presuming you have not attempted to use JavaScript/Go extensively on the server?

For server stuff, Go where you need performance, otherwise Python is good enough.

Python is too slow and lacks native low-level coding capability, thus probably isn't acceptable for systems programming such as a blockchain full node. Also I heard that Python has holes in it's type system such as monkey patching which will make it impossible to optimize and also leaky in terms of invariants. And Go lacks higher-level abstractions (i.e. no generics) and it has a suboptimal by convention only concurrency safety by frowning on shared memory and doesn't have my proposed performance improvements for automated GC. I heard a redesign of Go is being contemplated? I like some of the ideas from Go such as how they do low overhead green threads at a function boundary. I wonder how Go integrated their low-level pointers with the GC? If the GC moves objects then how does Go patch up all the pointers and which threads get stalled and when (i.e. is there only one GC instance for all threads)? Edit: apparently Go doesn't currently move objects, so it's not a well thought out design point. As expected Go is a suboptimal design in this respect.

So in summary there doesn't exist a programming language which combines good abstraction type system, native low-level and high-level coding, lockless model for avoiding data races, and a performant automated GC (noting that RC is a form of automated GC). Rust has the first three said items, but complexity is very high due to compile-time lifetimes and exclusive mutability instead of the latter said item. Go basically has none of the stated items, although at least it has automated GC, green threads, and CSP channels as an incomplete paradigm for the data race issue. Python has leaky but flexible abstraction and lacks the latter three said items. Java lacks all four said items, although the upcoming Scala 3 (compiled to JVM bytecode) addresses the first said item reasonably well and perhaps lockless design could be achieved by convention. C++ can do all the stated items except for lockless design (unless by some convention or library), but some facets only via libraries and the complexity is thus very high and verbose, and the optimal integration is lacking due to the limitations of doing these features in libraries. TypeScript's main flaw derives from JavaScript/ECMAScript in that there's no way to share native code level access to objects between threads. And the integration of low-level (via ASM.js) and high-level is inflexible and not optimal.

Why is it not clear that I'm trying to address a void in the programming language design space?

EDIT: another issue is the lack of anonymous union (as opposed to nominal/named unions), of which apparently Go has neither! Go's type system appears to be highly lacking. The only programming languages I know of which have anonymous unions are TypeScript, Scala 3 (aka Dotty/DOT calculus), Ceylon, and PureScript. Also TypeScript has elegant typeof and instanceof type guards which alleviate the need to explicitly cast the type. Scala provides the elegance and reduced boilerplate of "everything is an expression" that has a value (although with a transpiler we could simulate this in for example TypeScript with anonymous functions).

Answer 98 · 2020-06-26T20:42:55.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354644388
Original Date: Jan 1, 2018, 3:18 AM CST

If you use numpy then python can be pretty fast. Fast enough that recoding in 'C' is not worth the effort for many tasks. The more data you have, the bigger the benefit of working with numpy.

We do use JavaScript on the server side, but it has no equivalent to numpy, which means it cannot match python for numeric computation, even though it is a faster language in general. JavaScript is most useful for managing high latency operations for many users due to its asynchronous nature.

I have not really used Go yet for any major projects, but it's something I want to try.

Why is it not clear that I'm trying to address a void in the programming language design space?

Everything is possible using existing languages. So the question is, what are you trying to improve? My guess would be programmer productivity for a class of problems. These are not problems where performance is critical, because you are not trying to compete with C/C++/Rust for pure computational speed. It appears your target is more web-services, where currently TypeScript and Go probably offer the best scalability.

Erlang probably deserves a mention in the productivity/scalability category as well. The 'actor' like model enabled Ericsson to write a new operating system for their routers that is very robust, when a project in a more traditional language ('C' I think) was stuck in debugging. The sorry I heard was they were able to write a solution in Erlang that was correct in less time than it took to debug the already written solution in 'C'. So much so they they abandoned 'C' and moved exclusively to Erlang. Core routers require performance and parallelism to get high packet throughputs, and need to be robust and run for long periods of time without operator intervention.

Answer 99 · 2020-06-26T20:42:59.000Z

Original Author: @shelby3
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354647910
Original Date: Jan 1, 2018, 5:07 AM CST

@keean wrote:

If you use numpy then python can be pretty fast. Fast enough that recoding in 'C' is not worth the effort for many tasks. The more data you have, the bigger the benefit of working with numpy.

A numeric library is not very relevant to this discussion. Not only is it of limited interest to applications who need such a library, the existance of a certain library doesn't speak to the features and design of the programming language. JavaScript may also have an efficient numeric library available, given the C FFI for JavaScript is ASM.js (meaning any C/C++ library can be compiled for JavaScript).

The ~~sorry~~[story] I heard was they were able to write a solution in Erlang that was correct in less time than it took to debug the already written solution in 'C'.

That's a useless data point, because comparing to C means just about any other language would give better results in terms of paradigms and compiler assistance. The Actor model and Go's CSP are not panaceas. The devil is in the details and also the other features+paradigms these languages provide.

Everything is possible using existing languages.

Everything is possible in assembly language too. That doesn't mean anyone would want to code an entire program in assembly language. Thus your statement is meaningless and sidesteps the points I'm making about programming language features and paradigms.

Seems you totally didn't address the points I made in the prior post!

So the question is, what are you trying to improve? My guess would be programmer productivity for a class of problems. These are not problems where performance is critical, because you are not trying to compete with C/C++/Rust for pure computational speed. It appears your target is more web-services, where currently TypeScript and Go probably offer the best scalability.

Have you not read where I mentioned (several times in fact) that for a blockchain full node I need maximum performance? I have mentioned UTXO numerous times. Have you even googled the term "UTXO"? Do you not know that a blockchain node has to access GBs of unspent transaction outputs and parse cryptographic signatures, and other performance critical tasks such as processing Merkle trees, etc.

Did you already ignore the multiple times I have mentioned that TypeScript (JavaScript) can't access a shared UTXO data structure from multiple threads?

@shelby wrote:

This prevents the optimal multi-threaded access of a large shared immutable data structure such as the UTXO of a blockchain (which can be GBs).

How can you recommend TypeScript to me after I have told you this numerous times including mentioning it to you in a private message recently? The node would have to queue up the multithreaded workload and funnel it through a single-threaded UTXO access. There's ways to put shared data behind a FFI API as Memcached does, but then it's not native code and some verbose API (although I'm contemplating that a transpiler could hide all this verbosity and make everything appear to be native and so the transpilation to TypeScript might still remain my best choice but I was hoping to be able to choose an existing programming language I could start with immediately without needing to develop a transpiler first).

How can you recommend Go given it has no generics and not even an enum? Also Go does not provide an absolute guarantee that it won't stop the world for more than milliseconds, which is not ideal for full node which must respond to transaction requests within milliseconds. Also I had linked for you upthread to the blog essay written by Mike Hearn (a former key Bitcoin developer) pointing that Go had not introduced any new research for it's GC and thus the only way it achieves millisecond pause times is by sacrificing overall performance and memory consumption (and the millisecond pause goal is not a hard guarantee).

So far my analysis is that TypeScript offers the best type system features I want (such as anonymous unions and type guards) assuming I don't need HKT and (more than simulated, first-order) typeclasses, and most low-level features could be simulated (at a loss in performance) on TypeScript, so that might be the most seamless way to proceed if my ultimate goal is to create Lucid. (Note the lack of operator overloading on TypeScript is an annoying handicap) Otherwise, Rust offers excellent low-level control, typeclasses (but no anonymous unions) and perhaps also HKT by now or eventually, but can't simulate the nursery concept I'm proposing so I'd have to use it's lifetimes+exclusive mutability, else making everything RC and declare unsafe everywhere (and I presume there's currently no solution for eliminating circular reference leaks?). For client side, I don't know if Rust runs on all smartphones and how many different distributables and complex platform detection logic would be need, as opposed to JavaScript/TypeScript which runs on computers with a browser installed. Scala 3 offers anonymous unions, an implicit form of typeclasses, "everything as an expression", multithreaded access to shared data structures, some native low-level data types lacking on TypeScript/JavaScript but not as complete as Rust, and the JVM GC means I could simulate the nursery (but RC isn't possible so these would just be handled by the GC). So coding in (or transpiling to) Scala 3 has some advantages over TypeScript, but for the client-side the JVM doesn't run every where (and is even actively disabled on some platforms). Also Scala 3 is only in alpha or beta and probably not suitable for real world use yet. Another disadvantage for Scala is that many libraries are in Java and unlike Emscripten -> ASM.js, there's no way to compile C/C++ libraries for Java. Also the Scala compiler had a reputation of being extremely slow and riddled with bugs (although the new DOT is apparently a more well thought out compiler design).

P.S. have you seen that for version 2.6, TypeScript added support for a strict contravariance mode, to avoid the unsoundness of its "bivariance" heuristics.

Answer 100 · 2020-06-26T20:43:02.000Z

Original Author: @keean
Original URL: https://github.com/keean/zenscript/issues/35#issuecomment-354648164
Original Date: Jan 1, 2018, 5:15 AM CST