/infinite-list

Infinite lists aka streams

Primary LanguageHaskellBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

infinite-list Hackage Stackage LTS Stackage Nightly

Modern lightweight library for infinite lists with fusion:

  • API similar to Data.List.
  • No non-boot dependencies.
  • Top performance, driven by fusion.
  • Avoid dangerous instances like Foldable.
  • Use NonEmpty where applicable.
  • Use Word for indices.
  • Be lazy, but not too lazy.
{-# LANGUAGE PostfixOperators #-}
import Data.List.Infinite (Infinite(..), (...), (....))
import qualified Data.List.Infinite as Inf

Prior art and inspiration

  • Data.Stream.Infinite from streams package:

    • Large dependency footprint, e. g., adjunctions.
    • Provides dangerous instances such as Foldable.
    • No fusion framework.
  • Data.Stream from Stream package:

    • No fusion framework.
    • No repository or issue tracker.
  • GHC.Data.List.Infinite in GHC source tree:

    • Limited API, only to cater for GHC internals.
    • Not available as a separate package outside of GHC.

Why no Foldable or Traversable?

The breakdown of members of Foldable is as follows:

  • foldr, foldr1, foldMap, fold, toList and null can be productive on infinite lists;
  • foldr', foldMap' cannot, because forcing an accumulator even to a WHNF makes fold non-terminating;
  • foldl, foldl', foldl1 cannot, because no left fold can;
  • length always diverges;
  • elem either returns True, or does not terminate, but never returns False;
  • maximum, minimum, sum and product are unlikely to be productive, unless an underlying instance Ord or instance Num is extremely lazy.

Altogether it means that code, polymorphic by Foldable, cannot confidently work with infinite lists. Even a trivial refactoring can get you in a deep trouble. It's better to save users from this pitfall and do not provide instance Foldable at all. We do provide a right fold however.

Since there is no Foldable, there could be no Traversable. Even if it was not prohibited because of a missing superclass, there are only a few monads, which are lazy enough to be productive for infinite traversals. If you are looking for a traverse with a lazy state, use mapAccumL. We also provide traverse_ and for_, but with slightly different types.

Laziness

Operations, returning a data type with a single constructor, can be implemented in an extremely lazy fashion. Namely, always return the constructor before inspecting any of the arguments. For instance, note the irrefutable pattern matching in Data.List.NonEmpty:

map :: (a -> b) -> NonEmpty a -> NonEmpty b
map f ~(a :| as) = f a :| fmap f as

which is equivalent to

map :: (a -> b) -> NonEmpty a -> NonEmpty b
map f x = (let a :| _ = x in f a) :| (let _ :| as = x in fmap f as)

Because of it forcing the result to WHNF does not force any of the arguments, e. g., Data.List.NonEmpty.map undefined undefined `seq` 1 returns 1. This is not the case for normal lists: since there are two constructors, map has to inspect the argument before returning anything, and Data.List.map undefined undefined `seq` 1 throws an error.

While Data.List.Infinite has a single constructor, we believe that following the example of Data.List.NonEmpty is harmful for the majority of applications. Instead the laziness of the API is modeled on the laziness of respective operations on Data.List: a function Data.List.Infinite.foo operating over Infinite a is expected to have the same strictness properties as Data.List.foo operating over [a]. For instance, Data.List.Infinite.map undefined undefined `seq` 1 diverges.

Indexing

Most of historical APIs (such as Data.List) use Int to index elements of containers. This library makes another choice: namely, indices are represented by an unsigned type, Word. This way the notorious partial function (!!) :: [a] -> Int -> a becomes a total (!!) :: Infinite a -> Word -> a.

An argument can be made to use an arbitrary-precision type Natural instead of finite Word. Unfortunately, this causes performance penalties since Natural is represented by a heap object and cannot be easily unboxed. On any GHC-supported architecture the addressable memory is less than maxBound :: Word bytes and thus it's impossible to materialize a container with more than maxBound :: Word elements.