/recursion-schemes

Generalized bananas, lenses and barbed wire

Primary LanguageHaskellOtherNOASSERTION

recursion-schemes

Hackage Build Status

This package represents common recursion patterns as higher-order functions.

A familiar example

Here are two recursive functions.

sum :: [Int] -> Int
sum [] = 0
sum (x:xs) = x + sum xs

product :: [Int] -> Int
product [] = 1
product (x:xs) = x * product xs

These functions are very similar. In both cases, the empty list is the base case. In the cons case, each makes a recursive call on the tail of the list. Then, the head of the list is combined with the result using a binary function.

We can abstract over those similarities using a higher-order function, foldr:

sum     = foldr (+) 0
product = foldr (*) 1

Other recursive types

foldr works great for lists. The higher-order functions provided by this library help with other recursive datatypes. Here are two recursive functions on Trees:

depth :: Tree a -> Int
depth (Node _ subTrees) = 1 + maximum subTrees

size :: Tree a -> Int
size (Node _ subTrees) = 1 + sum subTrees

It is not possible to use foldr to simplify depth. Conceptually, foldr is flattening all the elements of the tree into a list before combining them with the binary function. This does not work for depth because it needs to examine the structure of the tree, which foldr flattened away.

We can instead use one of the higher-order functions provided by this library, cata.

import Data.Functor.Base (TreeF(..))
import Data.Functor.Foldable

-- data Tree  a   = Node  a [Tree a]
-- data TreeF a r = NodeF a [r     ]

depth :: Tree a -> Int
depth = cata go
  where
    go :: TreeF a Int -> Int
    go (NodeF _ subDepths) = 1 + maximum subDepths

size :: Tree a -> Int
size = cata go
  where
    go :: TreeF a Int -> Int
    go (NodeF _ subSizes) = 1 + sum subSizes

In this example, the code is a bit longer, but it is correct. Did you spot the mistake in the version which does not use cata? We forgot a call to fmap:

depth :: Tree a -> Int
depth (Node _ subTrees) = 1 + maximum (fmap depth subTrees)

size :: Tree a -> Int
size (Node _ subTrees) = 1 + sum (fmap size subTrees)

cata automatically adds this call to fmap. This is why subDepths contains a list of already-computed depths, not a list of sub-trees. In general, each recursive position is replaced by the result of a recursive call. These results have type Int, not type Tree, so we need a helper datatype TreeF to collect these results.

When you think about computing the depth, you probably think "it's 1 plus the maximum of the sub-depths". With cata, this is exactly what we write. By contrast, without cata, we need to describe both the "how" and the "what" in our implementation. The "how" is about recurring over the sub-trees (using fmap depth), while the "what" is about adding 1 to the maximum of the sub-trees. cata takes care of the recursion, so you can focus solely on the "what".

A recursion-scheme is a function like cata which implements a common recursion pattern. It is a higher-order recursive function which takes a non-recursive function as an argument. That non-recursive function describes the part which is unique to your calculation: the "what".

Types with many constructors

Let's look at a more complex example. Here is a small lambda-calculus and a function to compute the free variables of an expression:

import Data.Set (Set)
import qualified Data.Set as Set

data Expr
  = Var String
  | Lam String Expr
  | App Expr Expr
  | Constant Int
  | Add Expr Expr
  | Sub Expr Expr
  | Mul Expr Expr
  | Div Expr Expr
  | ...

freeVars :: Expr -> Set String
freeVars (Var name)      = Set.singleton name
freeVars (Lam name body) = Set.difference (freeVars body) (Set.singleton name)
freeVars (App e1 e2)     = Set.union (freeVars e1) (freeVars e2)
freeVars (Constant _)    = Set.empty
freeVars (Add e1 e2)     = Set.union (freeVars e1) (freeVars e2)
freeVars (Sub e1 e2)     = Set.union (freeVars e1) (freeVars e2)
freeVars (Mul e1 e2)     = Set.union (freeVars e1) (freeVars e2)
freeVars (Div e1 e2)     = Set.union (freeVars e1) (freeVars e2)
freeVars ...

As you can see, we had to repeat the Set.union (freeVars e1) (freeVars e2) line over and over. With recursion-schemes, this code becomes much shorter:

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable, TemplateHaskell, TypeFamilies #-}
import Data.Functor.Foldable.TH (makeBaseFunctor)

makeBaseFunctor ''Expr

freeVars :: Expr -> Set String
freeVars = cata go
  where
    go :: ExprF (Set String) -> Set String
    go (VarF name)           = Set.singleton name
    go (LamF name bodyNames) = Set.difference bodyNames (Set.singleton name)
    go fNames                = foldr Set.union Set.empty fNames

The makeBaseFunctor line uses Template Haskell to generate our ExprF datatype, a single layer of the Expr datatype. makeBaseFunctor also generates instances which are useful when using recursion-schemes. For example, we make use of the Foldable ExprF instance on the last line of go. This Foldable instance exists because ExprF has kind * -> *, while Expr has kind *.

Other recursion-schemes

All of our examples so far have used cata. There are many more recursion-schemes. Here is an example which follows a different recursive structure:

-- |
-- >>> halves 256
-- [256,128,64,32,16,8,4,2,1]
halves :: Int -> [Int]
halves 0 = []
halves n = n : halves (n `div` 2)

That recursive structure is captured by the ana recursion-scheme:

halves :: Int -> [Int]
halves = ana go
  where
    go :: Int -> ListF Int Int
    go 0 = Nil
    go n = Cons n (n `div` 2)

The Data.Functor.Foldable module provides many more.

Flowchart for choosing a recursion-scheme

In addition to the choices described by the flowchart, you can always choose to use a refold.

Contributing

Contributions and bug reports are welcome!