BartoszMilewski/Okasaki

Garbage collection

Opened this issue · 3 comments

I thought I'd leave this comment here:
I've been playing around with implementing an opt-in garbage collector as a library for C++.

I think this would be a pretty interesting tool for implementing more efficient functional data structures. Basically, there would be a opaque pointer type that would be produced by an allocation routine. Then root objects would be registered with the garbage collector. Each garbage-collectable type would have a reflection method that allowed the garbage collector to walk the object tree.

This way, things like bump-pointer allocation (allocations are only one addition operation!) and GC compaction could be implemented.

There's really no way to have garbage collection for all of C++. And we wouldn't want that. But, sometimes there's a much higher performance cost to certain abstractions in C++ than there would be with a garbage collector. I think this would quite a few useful use cases in a distributed task parallelism library I've been working on based around futures, IVars and LVars. I think there a lot of other embedded DSL use cases. While this isn't traditionally the domain of C++, I don't see why it shouldn't be...

Any thoughts?

Cheers,
Ben

Have you looked at Hans Boehm's work?

Yeah. It's an awesome idea, but it loses a lot of the potential of garbage collection to improve performance because it uses raw pointers. That means that first, it can't move allocated objects to compact the heap. Compaction reduces fragmentation and increases memory-locality (for example, a compacting GC could end up putting a linked-list in contiguous memory). Second, it needs to assume any chunk of data could be a pointer. Meaning, that the cost of GC run scales with the amount of allocated data, not with the amount of living data.

I've played around with the Boehm GC, but I generally don't get much better performance than a raw malloc or a shared_ptr, while with a simple GC implementation that allows data movement and doesn't assume arbitrary objects are pointers, I can get huge performance benefits. On the other hand, the Boehm GC is general purpose whereas what I'm proposing is a tool for very specific situations.