typelead/eta

Document laziness & runtime system thoroughly

rahulmutt opened this issue · 3 comments

Laziness is a particularly tricky topic, but once you get a handle on it, it's just as easy to fix laziness bugs as it is to fix memory leak issues that you may stumble across in GC'd languages. We should add a dedicated section to the user guide on laziness as well as the runtime system, how to analyze STG code for performance, and how to deal with SOEs.

@rahulmutt, you might start with a simple example of a program that causes problems due to laziness, explain why it occurs, and then explain alternate ways to fix it. You are free to use the following in your documentation if you find it useful:

A simple example that will cause Stack Overflow Error's (SOE's) due to laziness is as follows:

{-# LANGUAGE FlexibleContexts #-}
-- {-# LANGUAGE Strict #-}
-- {-# LANGUAGE BangPatterns #-}

module Main where

import Data.Bits
import Control.Monad.ST
import Data.Array.Base (newArray, STUArray(..), unsafeRead)
-- import Data.Array.ST (runSTUArray)

test :: () -> Int
test() = runST $ doit where
  doit :: ST s Int
  doit = do
    arr <- newArray (0,4093) 0 :: ST s (STUArray s Int Int)
    let
      loop x co =
        if x <= 0 then return co else
        let
          cntem i c =
            if i >= 131072 then loop (x - 1) c else do
              v <- unsafeRead arr (i `shiftR` 5)
              if v .&. (1 `shiftL` (i .&. 31)) /= 0
                then cntem (i + 1) c
              else cntem (i + 1) (c + 1)
--              else case c + 1 of nc -> nc `seq` cntem (i + 1) nc
        in cntem 0 co
    loop 10 0

main :: IO ()
main = do
  putStrLn $ show $ test()
  putStrLn "Hello, Eta!"

The above code will crash if the stack is too small (if it doesn't crash, just increase the value of the loop from 10 until it does) with output like the following:

Exception in thread "main" java.lang.StackOverflowError
        at eta.runtime.thunk.UpdateInfoStack.adjustAfterPop(UpdateInfoStack.java:64)
        at eta.runtime.thunk.UpdateInfoStack.pop(UpdateInfoStack.java:58)
        at eta.runtime.stg.StgContext.popUpdate(StgContext.java:195)
        at eta.runtime.thunk.UpdatableThunk.evaluate(UpdatableThunk.java:23)
        at main.Main$sat.thunkEnter(Main.hs:26)
        at eta.runtime.thunk.UpdatableThunk.evaluate(UpdatableThunk.java:19)
        at main.Main$sat.thunkEnter(Main.hs:26)
        at eta.runtime.thunk.UpdatableThunk.evaluate(UpdatableThunk.java:19)
        at main.Main$sat.thunkEnter(Main.hs:26)
**repeated for all of the stacked thunks**

This basically says that it is stuck evaluating a too high stack of unevaluated "Thunk's" due to the deferred count increasing operation(s) at line 26 that adds one to the count value for a zero bit. Even if the stack were not deep enough to actually crash, the code execution would be very slow since it takes a minimum of many tens of CPU clock cycles to evaluate a "Thunk".

The reason SOE's occur is as follows:

As the mutable array arr is protected from mutual modifications and other kinds of side effects by the ST monad, the compiler Strictness (Laziness) Analyzer cannot determine when the actual count of the number of "zero" bits in the array needs to be valid and thus leaves the counts to be determined lazily with a stack based connected series of unevaluated "thunk's", which are deferred execution functions (functions whose results haven't been evaluated yet). As the above program creates over a million of these, a standard stack is caused to overflow.

If the array were not wrapped in the ST monad, the Strictness Analyzer would not defer the increment of the count value (at least for current versions of Eta) since everything else is evaluated and non lazy so there would be little point

Methods to avoid the SOE's occurring:

We need to force the count bindings to be evaluated with every "loop" invocation of the containing recursive functions. There are convoluted ways to do this and the Eta language has extensions to make it easier, as follows, from easier to more convoluted:

  1. The very easiest way to solve the problem is to just uncomment the second line of the program to enable the automatic Strict extension, which will automatically cause all bindings to be evaluated on assignment. Strict mode is nothing more than putting "bang's" before all the assigned bindings as they are assigned as described in the next method...
  2. The next easiest way to solve the problem is to use the BangPatterns extension by uncommenting the third line instead of the second and then modifying the program very slightly by putting the ! "bang" symbol before the parameters for which we wish to force evaluation as in "loop x !co =" and "cntem i !c"; this will force the evaluation of these two parameters at the beginning of each "loop"/recursive function evaluation so that they can no longer be lazy. Using BangPatterns is nothing more than injecting the seq code as described in the next method...
  3. The standard "Haskell without extensions" way to force evaluation is to use seq instructions to cause the evaluation before the program can proceed; this can be done here by commenting out the looping line that changes the count by adding one and uncommenting the following line of "else case c + 1 of nc -> nc seq cntem (i + 1) nc". What this says is that, having determined that the count needs to be incremented, cause the determination of the new count value which is one more than previously and then "sequence" to cause this value to be actually evaluated (rather than storing an intention to evaluate it later) before looping around to the next possible determination. This method is what is actually being used by the other methods, just that it is wordier as in "boilerplate code".

Considerations in the use and avoidance of laziness and strictness:

Programmers new to the language may think "Why is laziness used at all or why is not strictness (non-laziness) the default?"; The answer is a little obscure and buried in the history of Haskell on which Eta is based but briefly can be explained as follows: Many of Haskell's data structures such as lazy lists are built around the idea of laziness and it was convenient although not necessary to make laziness the default for them; however, many of the higher order concepts of Haskell/Eta such as "natural" Y-combinators can be directly expressed only with laziness (in other words - deferred execution), it was considered to be a great asset to be able to express these directly by the academic types of people who designed the language. Also, since laziness became the default, one must consider that some of these extensions to express strictness more easily and concisely are relatively new and only available in "extended" Haskell so it is not so easy to force strictness in "standard" Haskell. Therefore, there are some historical reasons we don't always want Strict code and why it is not the default to the point that is has become part of the "culture" of Haskell, but also one of the greatest obstacles to learning Haskell/Eta.

Having enabled the Strict extension, we can force laziness in an analogous way to forcing laziness using BangPatterns by just prepending the ~ "unbang" character before any bindings that we want to be lazy; in this way we can force laziness in a otherwise Strict module. As explained above, this can be a desired thing to do for certain types of application. So having made strictness the default for a module, we can still get laziness when it is beneficial.

Further, although the above example forced evaluation to the maximum in one step as the subject was of a primitive integer type, for higher order types (not a primitive integer as in this case) it only evaluates to the outermost level, the Weak Head Normal Form (WHNF); thus, for an outer array binding this would force evaluation only sufficient to determine the array object allocation but not necessarily its contents and for lazy lists it only evaluates the container for the first element of the list and not necessarily either its head value nor especially its tail list - further applications of forcing strictness would be necessary to go deeper. For more complex examples, it is thus more complex to overcome the problems of laziness when they occur.

Note also that many of the standard libraries were written assuming laziness and compile but don't run correctly when forced to be Strict unless changes were made to the code, and that the Strict extension applies only to the module in which it is used and not to the modules that get imported. Thus, unless the libraries were all modified to be compiled and run as Strict with laziness as a selected option where beneficial, there will always have to be a combination of the use of strictness and laziness considered when calling these other modules.

Nice explanation. We'll add it at some point - or you're free to create a new section yourself and start it off and submit a PR.

See the format of the other sections here:
https://github.com/typelead/eta/tree/master/docs/0-user-guides/0-eta-user-guide

@rahulmutt, I'll change the format to your documentation standard template and submit it as a PR "after a while" when I finish my other PR(s).