GC related performance issue

Question

GC related performance issue

Opened this issue 10 years ago · 3 comments

In the following code unsafePerformIO stands for a long computation.

long k = do
    i <- newIORef 0
    smp <- start $ do
        generator $ pure $ do
            s1 <- generator $ pure $ return $! unsafePerformIO $ readIORef i >>= writeIORef i . (+1)
            snapshot s1
    replicateM_ k smp
    readIORef i >>= print

With GHC 7.8.4, compiler options -O2, I get the following numers:

long 0    ->    0
long 1    ->    1
long 2    ->    3
long 3    ->    6
long 4    ->    10
...
long 10000    ->    238726

For small inputs it is quadratic, for large ones less than quadratic (maybe GC related).

Answer 1 · 2015-02-02T22:48:01.000Z

Yes, this is a known issue. The GC will clean out the unreferenced signals from time to time, but they will naturally stay around in the meantime. The overhead is of course extreme in a microbenchmark like this one, but it becomes less of an issue when there’s more real logic causing more allocations. You can actually force a GC after every sampling, but it will be a lot slower.

I wonder if Elerea was a full-fledged language with its own runtime, would there be a GC scheme that would fit the kind of access patterns it requires much better? For instance, Lucid Synchrone constrains signal ownerships never to change, i.e. every signal dies when its parent (creator) dies. This makes for a much better memory behaviour, but of course it limits the structure of the programs, since you cannot detach a signal from the context it was created in.

Answer 2 · 2015-02-02T23:43:29.000Z

I am interested rather in design issues than in performance issues.
A very small fraction of the unnecessary computations of the present performance issue can be caused by Elerea's evaluation strategy.
Unfortunately I cannot show an expoit yet, partially because the GC disturbs my measurements.

This is what I am looking for:

As I see, in Elerea's current evaluation strategy, sampling computes future values (for example, what will be the next value of a delayed signal) using present values.
Instead of this, sampling could compute present values using past values.

I've already modelled this strategy. It seems to work, but it is sensitive to signals created by 'external'. This means that external signals should be wrapped in managed signals, which cannot change between two samplings. A minor consequence of this is that external should have this type signature:

external :: a -> (SignalGen (Signal a), a -> IO ())

This compute-present-using-past-values strategy is a bit more strict than the current strategy but it has less issues. One such issue in the current strategy is that sometimes it computes signal values prematurely. I was looking for an exploit for this when I've found the present performance issue.

Answer 3 · 2016-06-19T15:41:33.000Z

Heads up: the new 2.9.0 version implements the above change to external.