haskell/core-libraries-committee

Add a `foldlM'` combinator to "Data.Foldable" that is strict in the accumulator.

Closed this issue · 46 comments

Presently, "Data.Foldable" provides a foldlM combinator defined as:

foldlM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM f z0 xs = foldr c return xs z0
  -- See Note [List fusion and continuations in 'c']
  where c x k z = f z x >>= k
        {-# INLINE c #-}

Note that this is not strict in the accumulator, and in at least some cases leaks space when compiled with -O, rather than -O2, even when f` is strict. I am proposing the addition of a:

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = foldr c pure xs z0
  -- See Note [List fusion and continuations in 'c']
  where c x k !z = f z x >>= k
        {-# INLINE c #-}

which is strict in the accumulator. This was prompted by an episode of the Haskell Unfolder.

There are perhaps some nits to decide, e.g. whether list-fusion is compatible with eta-reduction by writing instead:

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f = flip (foldr c pure)
  -- See Note [List fusion and continuations in 'c']
  where c x k !z = f z x >>= k
        {-# INLINE c #-}

or instead:

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = \ z0 xs -> foldr c pure xs z0
  -- See Note [List fusion and continuations in 'c']
  where c x k !z = f z x >>= k
        {-# INLINE c #-}

For the record, the "Note" in question reads:

Suppose we define 
    mapM_ f = foldr ((>>) . f) (return ())
(this is the way it used to be).
Now suppose we want to optimise the call
    mapM_ <big> (build g)
      where 
        g c n = ...(c x1 y1)...(c x2 y2)....n...
GHC used to proceed like this:
    mapM_ <big> (build g)
              
    = { Definition of mapM_ }
      foldr ((>>) . <big>) (return ()) (build g)

    = { foldr/build rule }
      g ((>>) . <big>) (return ())

    = { Inline g }
      let c = (>>) . <big>
          n = return ()
      in ...(c x1 y1)...(c x2 y2)....n...
The trouble is that `c`, being big, will not be inlined.  And that can
be absolutely terrible for performance, as we saw in #8763.
It's much better to define
    mapM_ f = foldr c (return ())
      where
        c x k = f x >> k
        {-# INLINE c #-}
Now we get
    mapM_ <big> (build g)

    = { inline mapM_ }
      foldr c (return ()) (build g)
        where c x k = f x >> k
              {-# INLINE c #-}
              f = <big>
Notice that `f` does not inline into the RHS of `c`,
because the INLINE pragma stops it; see
Note [Simplifying inside stable unfoldings] in GHC.Core.Opt.Simplify.Utils.
Continuing:
    = { foldr/build rule }
      g c (return ())
        where ...
           c x k = f x >> k
           {-# INLINE c #-}
              f = <big>

    = { inline g }
      ...(c x1 y1)...(c x2 y2)....n...
        where c x k = f x >> k
              {-# INLINE c #-}
              f = <big>
              n = return ()

        Now, crucially, `c` does inline

    = { inline c }
      ...(f x1 >> y1)...(f x2 >> y2)....n...
        where f = <big>
              n = return ()
And all is well!  The key thing is that the fragment
`(f x1 >> y1)` is inlined into the body of the builder
`g`.
foldlM' f z0 xs = foldr c pure xs z0
  where c x k !z = f z x >>= k

Could you explain why the definition in terms of foldr and not a simpler definition in terms of foldlM, say one of the below?

foldlM' f = foldlM (\(!b) -> f b)

foldlM' f = foldlM (\b a -> f b a >>= (pure $!))

Could you explain why the definition in terms of foldr and not a simpler definition in terms of foldlM, say one of the below?

It seems you're right, the below seems to work equally well:

foldlM' f = foldlM (\ !acc -> f acc)

as does ensuring that the acc argument to f is manifestly strict.

Strictness annotations are great for enforcing the evaluation order even in an Applicative context.

foldMapA :: (Foldable t, Applicative f, Monoid y) => (x -> f y) -> t x -> f y
foldMapA f = foldl' (\ !fy x -> liftA2 (<>) fy (f x)) (pure mempty)

So, modulo the choice of definition, is there support for adding a foldlM'?

A draft merge request indicates a serious commitment to seeing the changes through. A branch with these changes would easily pass most of the GHC CI (in time). Benchmarks would also help your case.

As demonstrated above, foldlM' can be obtained from foldlM by a correct choice of f. That means that, unlike in the case of foldl, we don't need the strict version. It might be helpful to have it anyway, but it doesn't seem urgent.

FWIW, foldlM is equivalent to foldlM' for any return-strict monad, and I'm inclined to suggest that users should be using return-strict monads by default anyway (even though there aren't many of them in the ecosystem yet).

As demonstrated above, foldlM' can be obtained from foldlM by a correct choice of f.

This is not true. Or rather, it is true only for some monads.
Take lazy State or Writer, for instance. The proposed alternatives will show a space leak.

Benchmark
{-# LANGUAGE BangPatterns #-}
import Control.Monad.Trans.State.Lazy
import Data.Foldable (foldlM)
import Test.Tasty.Bench

main :: IO ()
main = defaultMain
  [ env (pure xs_) $ \xs ->
    bgroup "bench"
    [ bench "foldlM" $ whnf (benchFoldlM xs) foldlM
    , bench "foldlM_1" $ whnf (benchFoldlM xs) foldlM_1
    , bench "foldlM_2" $ whnf (benchFoldlM xs) foldlM_2
    , bench "foldlM'" $ whnf (benchFoldlM xs) foldlM'
    ]
  ]
  where
    xs_ :: [Int]
    xs_ = [1..100000]

benchFoldlM
  :: [Int]
  -> ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int)
  -> Int
benchFoldlM xs foldlM = flip evalState () $ foldlM (\a b -> pure (a+b)) 0 xs

foldlM_1 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_1 f = foldlM (\ !x -> f x)

foldlM_2 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_2 f = foldlM (\b a -> f b a >>= (pure $!))

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = foldr c pure xs z0
  where c x k !z = f z x >>= k
        {-# INLINE c #-}
All
  bench
    foldlM:   OK
      30.5 ms ± 2.2 ms,  29 MB allocated,  36 MB copied,  52 MB peak memory
    foldlM_1: OK
      35.8 ms ± 2.9 ms,  33 MB allocated,  38 MB copied,  52 MB peak memory
    foldlM_2: OK
      30.4 ms ± 2.9 ms,  29 MB allocated,  37 MB copied,  52 MB peak memory
    foldlM':  OK
      4.90 ms ± 456 μs,  25 MB allocated, 1.5 KB copied,  52 MB peak memory

@tomjaguarpaw What is a "return-strict monad"? Making pure strict in (almost any) Monad violates the monad identity laws, badly breaking reasoning.

@meooow25 This is very interesting, thank you! For the purpose of avoiding space leaks, the proposed foldM is as good as

foldMS _ z [] = pure z
foldMS f z (x:xs) = do
  !z' <- f z x
  foldMS f z' xs

My mistake was assuming that this was equivalent to

foldMS2 _ z [] = pure z
foldMS2 f z (x:xs) = do
  z' <- f z x
  pure $! z'
  foldMS2 f z' xs

In a lazy monad the equivalence doesn't hold, because undefined >>= pure x == pure x. I find lazy monads extremely hard to understand and I'm doubtful whether we should be encouraging their use, so I don't see this proposal as particular well-motivated from the point of view of avoiding space leaks. In terms of raw performance, my experiments with your benchmark code suggest that foldlM_2 is as good as foldlM' for strict State and IO (foldlM_1 is worse, but I don't understand why).

@treeowl See the Discourse post I linked for an explanation. I would appreciate it if you can give some demonstrations of how violation of the pure law badly breaks reasoning in practice, because no one was able to come up with one in that discussion.

gbaz commented

I think the monadFix example in that discussion was very clear, and also definitive.

The fact that you don not use or like MonadFix hardly matters -- its a very real use case, and the reasoning to arrive at a given MonadFix instance is exactly the sort of reasoning that breaks.

@meooow25, here's my entry for "can we do it with just foldlM?". It doesn't leak, but it's not as fast as the custom strict fold, and I'm not optimistic it can be made to perform as well in general.

-- | Turn a "lazy monad" like lazy State into a strict one.
newtype Strictly m a = Strictly
  { unStrictly :: m (Solo a) }

instance MonadTrans Strictly where
  lift = Strictly . fmap MkSolo

-- Run a `Strictly` computation. Forcing the result(s) forces all intermediate computations.
runStrictly :: Monad m => Strictly m a -> m a
runStrictly (Strictly m) =
  m >>= \(MkSolo a) -> pure a

instance Monad m => Functor (Strictly m) where
  fmap = liftM
  x <$ m = m >> pure x

instance Monad m => Applicative (Strictly m) where
  pure = Strictly . pure . MkSolo
  liftA2 = liftM2
  m *> n = m >>= \_ -> n

instance Monad m => Monad (Strictly m) where
  Strictly m >>= f = Strictly $
    m >>= \(MkSolo a) -> unStrictly (f a)

foldlM_3 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_3 f b = runStrictly . foldlM (\ !x -> lift . f x) b

Here are the benchmarks:

All
  bench
    foldlM:   OK
      51.7 ms ± 1.3 ms,  29 MB allocated,  36 MB copied,  52 MB peak memory
    foldlM_1: OK
      63.4 ms ± 2.3 ms,  34 MB allocated,  39 MB copied,  52 MB peak memory
    foldlM_2: OK
      52.0 ms ± 4.2 ms,  28 MB allocated,  35 MB copied,  52 MB peak memory
    foldlM_3: OK
      12.2 ms ± 824 μs,  34 MB allocated, 2.6 KB copied,  52 MB peak memory
    foldlM':  OK
      9.66 ms ± 693 μs,  25 MB allocated, 1.5 KB copied,  52 MB peak memory

@tomjaguarpaw , I really don't like having to think about what functor/applicative/monad I'm working in when using the laws to transform code. It's a lot of extra mental overhead. That applies equally to your "return-strict monads" and to such monstrosities as lazy Writer and lazy State.

BTW, Codensity from the kan-extensions package seems to work as well as my Strictly monad transformer in the context of these benchmarks, but that seems a bit over the top.

here's my entry for "can we do it with just foldlM?".

Fantastic idea!

EDIT: My version with IdentityT was benchmarking the wrong thing.

In fact, if I'm benchmarking it right, the same approach with IdentityT recovers the performance of the custom strict fold.

    foldlM_3I: OK
      31.8 ms ± 1.7 ms, 175 MB allocated, 7.4 KB copied,  84 MB peak memory
    foldlM':   OK
      33.6 ms ± 1.6 ms, 175 MB allocated, 6.9 KB copied, 163 MB peak memory
    foldlM_3:  OK
      45.8 ms ± 1.8 ms, 266 MB allocated,  12 KB copied, 245 MB peak memory
foldlM_3I :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_3I f b = runIdentityT . foldlM (\ !x -> lift . f x) b

I'd be grateful if someone can attempt to replicate this to make sure I haven't made a silly mistake.

Yeah, it was a silly mistake :(


Off-topic (please make replies on the return strict monad Discourse thread).

I really don't like having to think about what functor/applicative/monad I'm working in when using the laws to transform code

Absolutely. Additionally, I don't like having to think about space leaks introduced by failing to make invalid laziness unrepresentable. There's a trade off here and I think it's interesting to consider whether it's possible weaken one of the monad laws to f >=> pure == (f $!) and receive outsized benefits in return. I haven't yet seen anything that convinces me that approach is futile. If you are convinced then I would appreciate it if you can share the evidence that convinces you.

I think the monadFix example in that discussion was very clear, and also definitive.

Subsequent discussion on the thread demonstrated that MonadFix is still available in return strict monads.

I would welcome more input on the topic of "return strict" monads, particularly dissenting opinions with concrete examples of things that go wrong when using a "return strict" monad. Please keep the discussion on the return strict Discourse thread though.

I haven't tried with IdentityT, but as far as I can tell, the only way it could possibly do something here is by subtly tweaking the optimizer (most likely through arity changes). The dot in lift . f x might be doing more than its share of work. IdentityT is absolutely the blandest monad transformer imaginable.

You're right. I was benchmarking the wrong thing!

As a practical matter, why would you be doing a strict monadic fold in a lazy monad? It seems generally better to do that in the corresponding "strict" monad and convert the resulting action to the lazy form. That's why I haven't made a package for something like the above Strictly—it seems like it would need to go in the acme category.

Here's a benchmark I believe to be correct. It shows that a definition of monad Strictly2 is sufficient to get results that are twice as fast as the custom strict fold! I'm actually having trouble understanding how this works, but it does seem to.

Again, I'd appreciate it if someone could double-check my work because it's easy to introduce small errors that have big effects.

    foldlM_Strictly2: OK
      3.50 ms ± 173 μs,  18 MB allocated, 999 B  copied,  14 MB peak memory
    foldlM':          OK
      7.60 ms ± 281 μs,  25 MB allocated, 1.5 KB copied,  22 MB peak memory
    foldlM_Strictly:  OK
      9.02 ms ± 688 μs,  34 MB allocated, 2.4 KB copied,  29 MB peak memory
    foldlM_1S:        OK
      56.7 ms ± 2.9 ms,  34 MB allocated,  42 MB copied,  65 MB peak memory
    foldlM_2S:        OK
      60.7 ms ± 2.9 ms,  33 MB allocated,  46 MB copied,  98 MB peak memory

The operative definition is the strict bind.

instance Monad m => Monad (Strictly2 m) where
  Strictly2 m >>= f = Strictly2 $
    m >>= \ !a -> unStrictly2 (f a)

N.B. Strictly2 violates the monad law pure >=> f == f. Instead pure >=> f == (f $!), so it is a "return strict" monad.

Code

#!/usr/bin/env cabal
{- cabal:
build-depends: base, transformers, tasty-bench, mtl
-}
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O1 #-}
import Control.Monad.Trans.State.Lazy
import Control.Monad.Trans
import Data.Foldable (foldlM, toList)
import Test.Tasty.Bench
import Data.Tuple
import Control.Applicative
import Control.Monad

main :: IO ()
main = defaultMain
  [ env (pure 100000) $ \n ->
    bgroup "bench"
    [ bench "foldlM_Strictly2" $ whnf (benchFoldlM (xs n)) foldlM_Strictly2
    , bench "foldlM'" $ whnf (benchFoldlM (xs n)) foldlM'
    , bench "foldlM_Strictly" $ whnf (benchFoldlM (xs n)) foldlM_Strictly
    , bench "foldlM_1S" $ whnf (benchFoldlM (xs n)) foldlM_1
    , bench "foldlM_2S" $ whnf (benchFoldlM (xs n)) foldlM_2
    ]
  ]
  where
    xs :: Int -> [Int]
    xs n = [1..n]

benchFoldlM
  :: [Int]
  -> ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int)
  -> Int
benchFoldlM xs foldlM = flip evalState () $ foldlM (\a b -> pure (a+b)) 0 xs

foldlM_1 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_1 f b = foldlM (\ !x -> f x) b . toList

foldlM_2 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_2 f = foldlM (\b a -> f b a >>= (pure $!))

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = foldr c pure xs z0
  where c x k !z = f z x >>= k

-- | Turn a "lazy monad" like lazy State into a strict one.
newtype Strictly m a = Strictly
  { unStrictly :: m (Solo a) }

instance MonadTrans Strictly where
  lift = Strictly . fmap Solo

-- Run a `Strictly` computation. Forcing the result(s) forces all
-- intermediate computations.
runStrictly :: Monad m => Strictly m a -> m a
runStrictly (Strictly m) =
  m >>= \(Solo a) -> pure a

instance Monad m => Functor (Strictly m) where
  fmap = liftM
  x <$ m = m >> pure x

instance Monad m => Applicative (Strictly m) where
  pure = Strictly . pure . Solo
  liftA2 = liftM2
  m *> n = m >>= \_ -> n

instance Monad m => Monad (Strictly m) where
  Strictly m >>= f = Strictly $
    m >>= \(Solo a) -> unStrictly (f a)

foldlM_Strictly :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_Strictly f b = runStrictly . foldlM (\ !x -> lift . f x) b

-- | Turn a "lazy monad" like lazy State into a strict one.
newtype Strictly2 m a = Strictly2
  { unStrictly2 :: m a }

instance MonadTrans Strictly2 where
  lift = Strictly2

-- Run a `Strictly2` computation. Forcing the result(s) forces all
-- intermediate computations.
runStrictly2 :: Monad m => Strictly2 m a -> m a
runStrictly2 (Strictly2 m) = m

instance Monad m => Functor (Strictly2 m) where
  fmap = liftM

instance Monad m => Applicative (Strictly2 m) where
  pure = Strictly2 . pure
  liftA2 = liftM2

instance Monad m => Monad (Strictly2 m) where
  Strictly2 m >>= f = Strictly2 $
    m >>= \ !a -> unStrictly2 (f a)

foldlM_Strictly2 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_Strictly2 f b = runStrictly2 . foldlM (\ x -> lift . f x) b

As for the benchmarks of foldMapA:

All
  bench
    foldMapA: OK
      6.29 ns ± 508 ps
    foldlM:   OK
      19.2 ms ± 1.6 ms
    foldlM_1: OK
      24.0 ms ± 2.3 ms
    foldlM_2: OK
      19.7 ms ± 1.9 ms
    foldlM':  OK
      3.63 ms ± 251 μs
import Control.Monad.State
import Data.Foldable qualified as Fold
import Data.Monoid
import Test.Tasty.Bench

main :: IO ()
main =
  defaultMain
    [ env (pure @IO @Int 100_000) \n ->
        bgroup "bench" $
          bench "foldMapA" (whnf example n)
            : fmap
              (\(name, f) -> bench name $ whnf (benchFoldlM n) f)
              [ ("foldlM", Fold.foldlM)
              , ("foldlM_1", foldlM_1)
              , ("foldlM_2", foldlM_2)
              , ("foldlM'", foldlM')
              ]
    ]

example :: Int -> State () (Sum Int)
example i = foldMapA (pure . Sum) [1 .. i] where
  foldMapA :: (Foldable t, Applicative f, Monoid m) => (x -> f m) -> t x -> f m
  foldMapA f = Fold.foldl' (\ !fm x -> liftA2 (<>) fm (f x)) (pure mempty)

benchFoldlM ::
  Int ->
  ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int) ->
  Int
benchFoldlM i __ =
  flip evalState () $ __ (\a b -> pure (a + b)) 0 [1 .. i]

foldlM_1 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_1 f b = Fold.foldlM (\ !x -> f x) b . Fold.toList

foldlM_2 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_2 f = Fold.foldlM (\b a -> f b a >>= (pure $!))

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = Fold.foldr c pure xs z0
 where
  c x k !z = f z x >>= k

The foldA benchmark isn't actually running the state computation. You need an evalState in example.

@treeowl nice, that was a fun exercise to figure out how Strictly works.

@tomjaguarpaw regarding Strictly2, it indeed seems to perform better. Your code is equivalent to

foldlM_likeStrictly2' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_likeStrictly2' f z0 xs = foldr c pure xs z0
 where
  c x k z = f z x >>= \ !x -> k x -- this is >>= of Strictly2

I see similar results if we change the proposed foldlM' to

foldlM_finalStrict' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_finalStrict' f z0 xs = foldr c (pure $!) xs z0
                                         -- ^ strict here too
  where c x k !z = f z x >>= k

In fact I was going to suggest that we make the proposed function strict in the final value.

Also note that foldlM_likeStrictly2', unlike foldlM_finalStrict', is not strict in the initial value z0 (i.e. it is passed to f without being forced). But this can be changed with a bang on z0, for instance.

foldlM_likeStrictly2InitialStrict' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_likeStrictly2InitialStrict' f !z0 xs = foldr c pure xs z0
 where
  c x k z = f z x >>= (k $!)

@vdukhovni I suggest that the proposed function be updated to be strict in the final value, as in foldlM_finalStrict' (or alternately foldlM_likeStrictly2InitialStrict'). This would make it match the function shown in the Haskell Unfolder episode you mentioned (named repeatedlyM1 there).

Also, forgot to say this before, as a user I am +1 on seeing foldlM' in Data.Foldable.


Aside on benchmarking

The purpose of env foo $ \fooData -> ... is to make sure foo is evaluated to NF before the benchmark is run. Please don't move it inside the benchmark, otherwise you're also measuring the time it takes to generate foo. Depending on how it's used, it might also get shared between runs or benchmarks, making things even more uncertain.

The foldA benchmark isn't actually running the state computation. You need an evalState in example.

You're so right! Let's fix that!

main :: IO ()
main =
  defaultMain
    [ env (pure @IO @Int 100_000) \n ->
        bgroup "bench" do
          fmap
            (\(name, f) -> bench name $ whnf (either ($ n) $ benchFoldlM n) f)
            [ ("foldMapA", Left example)
            , ("foldlM", Right Fold.foldlM)
            , ("foldlM_1", Right foldlM_1)
            , ("foldlM_2", Right foldlM_2)
            , ("foldlM'", Right foldlM')
            ]
    ]

example :: Int -> Int
example i = getSum $ evalState (foldMapA (pure . Sum) [1 .. i]) ()
 where
  foldMapA f = Fold.foldl' (\ !fm x -> liftA2 (<>) fm (f x)) (pure mempty)
  bench
    foldMapA: OK
      856  μs ±  53 μs
    foldlM:   OK
      19.1 ms ± 1.8 ms
    foldlM_1: OK
      24.0 ms ± 2.2 ms
    foldlM_2: OK
      19.6 ms ± 1.9 ms
    foldlM':  OK
      3.63 ms ± 248 μs

Speaking of accidentally benchmarking the wrong thing, I was wrong about Codensity; it actually does better than Strictly, and very nearly as well as the custom strict fold!

@mixphix Your example isn't comparable. You're folding over a list that will end up fusing away, and not being realized in memory. That said, you're also theoretically building a huge State computation, but ... I'm betting GHC is being really clever about your left fold in a way that makes it basically throw away the whole thing. I'll need to look at some Core to understand better. Your example definitely suggests that we should be more careful here, and in particular should use the state sometimes—or at least pretend to.

You're folding over a list

benchFoldlM
  :: [Int]
  -> ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int)
  -> Int
benchFoldlM xs foldlM = flip evalState () $ foldlM (\a b -> pure (a+b)) 0 xs

You're folding over a list

benchFoldlM
  :: [Int]
  -> ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int)
  -> Int
benchFoldlM xs foldlM = flip evalState () $ foldlM (\a b -> pure (a+b)) 0 xs

I'm sorry, but I have no idea what any of this comment means. Could you please expand on that?

Thanks to @meooow25 and @treeowl I understand better what's going on here, and subsequently I have more confidence that my original alternative is the correct implementation.

As pointed out by @treeowl, Strictly turns a lazy monad into a strict one. Strictly2 goes a step further and turns

foldMS _ z [] = pure z
foldMS f z (x:xs) = do
  z' <- f z x
  foldMS f z' xs

into

foldMS _ z [] = pure z
foldMS f z (x:xs) = do
  !z' <- f z x
  foldMS f z' xs

But is this really want you want in a lazy monad? @meooow25 pointed out, and I was surprised, that my original alternative still leaks space for lazy monads. But now I realise that's the point! The point of a lazy monad is to "trade space for time" in a certain sense. In the lazy state monad, if you have a very expensive integer state computation, followed by put 0, then the expensive computation will not be performed (in exchange for this benefit you pay by keeping the whole integer state computation around until evaluation time).

The foldlM' of this proposal is incorrect from that point of view. If the user is using a lazy monad then the whole point is to make this tradeoff!1 If we're going to add a foldlM' it should be one that doesn't subvert the user's choice in this regard.

Footnotes

  1. I don't actually recommend ever using lazy monads for practical code.

@vdukhovni do you wish to proceed with the CLC proposal?

@vdukhovni how would you like to proceed with the proposal? If there are no updates in a month, I'll close as abandoned.

Having read all the comments, I'm not sure what to make of them.

On the one hand, from #255 (comment) it seems that the proposed foldlM'is advantageous in some situations (outperforms alternatives).

On the other hand, there are perhaps some objections I have failed to appreciate. Are the objections about how to define foldlM' or whether to do it at all???

My instinct is to try to help a typical proficient, but non-expert user, without breaking anything, and or leading to surprising behaviour. So I still think a foldlM' could be useful, but I can't tell whether there's either a consensus way forward, or a consensus objection in the above comments.

I can't tell whether there's either a consensus way forward, or a consensus objection in the above comments.

There is no need to build a consensus. A simple majority is enough.

That's fine, but I'm not even sure what the majority view is from the above. Is there some support for the originally proposed definition? Or did the above discussion identify a better definition or a reason to not proceed?

As the proposal author it's up to you how to proceed, which is why we continue to ask for your direction on this. If you decide to abandon the proposal, that will be that; if you decide to trigger a vote, please do so with a merge request with your favourite flavour of foldlM'.

I think the proposal to add foldM' is sensible. However, I am opposed to the originally-proposed versions. As I elaborated, those versions turn lazy monads strict, which is presumably not what one intended from one's choice of a lazy monad1. I support the alternative in #255 (comment), that is

foldlM' f = foldlM (\ !acc -> f acc)

Footnotes

  1. Although, I don't really understand lazy monads at all, nor why one would want to use them.

I'm generally in favor of adding foldlM'. We use to say "never foldl, always foldl'" and it's helpful (= less error-prone when you write code without thinking) to be able to extend this mantra to foldlM. I did not have time to absorb the intricacies of potential implementations, but I'm ready to approve what @tomjaguarpaw suggests (because of his track record wrt laziness in Haskell).

@parsonsmatt @angerman @velveteer @hasufell what's your take on this thread?

My take is that the object about turning lazy monads strict makes no sense. If the user asks for a strict fold, they get what they asked for. If that's not desired, don't use a strict fold.

They even copy the proposed code and use it directly, nothing stops them. Since benchmark numbers show better performance in some tests for the proposed implementation, I'm inclined to suggest that's the right now, and more closely analogous with the definition of foldl'.

If the user asks for a strict fold, they get what they asked for

Well, there are different things that "strict" could apply to. The "left monadic fold strict in the accumulator" would be foldM (\ !acc -> f acc). The "left monadic fold strict in the bind" would be one of the original versions. I think the name foldlM' would typically be understood as the former.

Since benchmark numbers show better performance in some tests for the proposed implementation

Have we seen better performance under any non-lazy monad? Users who choose a to use a lazy monad do so for some reasons, which the proposed implementation subverts, so I don't think we can justify the proposed implementation by noting its performance benefits on lazy monads only.

more closely analogous with the definition of foldl'

I believe foldl' is understood as making the accumulator strict and so would correspond to my former definition.

I find it interesting that there is disagreement on what the semantics of foldlM' ought to be. Is that not in itself an argument against adding it to base? What intuition would most users have about this?

I tend to agree with @tomjaguarpaw that I would probably only expect the accumulator to be strict. But I'm not an avid user of lazy monads anyway.

In case the alternatives aren't clear to a reader, here is an attempt to explain, using lists specifically.

We have foldlM as

foldlM f acc0 xs0 = go acc0 xs0
  where
    go acc xs = case xs of
      [] -> pure acc
      x:xs' -> do
        acc' <- f acc x
        go acc' xs'

Now, what should foldlM' be? I agree with @vdukhovni that it should be

foldlM'A f acc0 xs0 = go acc0 xs0
  where
    go !acc xs = case xs of
    -- ^ bang, always strict in the accumulator
      [] -> pure acc
      x:xs' -> do
        acc' <- f acc x
        go acc' xs'

@tomjaguarpaw says instead

foldlM'B f acc0 xs0 = go acc0 xs0
  where
    go acc xs = case xs of
      [] -> pure acc
      x:xs' -> do
        acc' <- let !acc1 = acc in f acc1 x
                 -- ^ bang
        go acc' xs'

It makes a difference because the monad's bind (>>=) may be lazy in the first argument. In this case we care about the argument let !acc1 = acc in f acc1 x.
If the bind is lazy, foldlM'B would introduce a chain of thunks but foldlM'A would not.

Thanks very much @meooow25, that makes the difference nice and clear. Here is a very obscure behavioural difference between the two versions:

ghci> case flip runState () (foldlM'A (\() -> id) () [undefined, put ()]) of (_, _) -> ()
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  undefined, called at <interactive>:100:49 in interactive:Ghci21
ghci> case flip runState () (foldlM'B (\() -> id) () [undefined, put ()]) of (_, _) -> ()
()

I really don't know if this warrants any concern. My inclination now is not. Perhaps @treeowl's assessment is the right way to think about it:

As a practical matter, why would you be doing a strict monadic fold in a lazy monad?

Perhaps we should simply not care about how strict monadic fold interacts with lazy monads. It's very, very difficult to do something sensibly lazy with a strict monadic fold, so perhaps my earlier concern that we would be subverting the intention of the choice of lazy monad is unfounded.

@meooow25 @tomjaguarpaw @vdukhovni what's the conclusion here? Shall we go for foldlM'A?

My preference is one of these, for simplicity: #255 (comment). If we did accept that I would be open to tweaking it for efficiency in the future. On the other hand, I'm not opposed to foldM'A (or indeed any other implementation) now.

My preference is still for (the Foldable version of) foldlM'A. I expect vdukhovni makes the final call as the proposer.

@vdukhovni it's ultimately your call to choose.

@vdukhovni could we possibly make some progress here? Otherwise I'll close as abandoned at the end of July.

Closing as abandoned, feel free to reopen when there are resources to make more progress.

FWIW, you can close issues as wontfix (instead of completed):

image