/itertools

fast itertool package for golang

Primary LanguageGo

itertools

Build Status

A fast iterator package for Go.

Inspired by the python iterator functions this provides analogous abstraction and (kind'like) semantics.

For the sake of speed, the yield-like behaviour is implemented without channels and methods. Instead functional scope (context) provides state for the itertools iterators.

The package includes dedicated iterators for "int", "int64", "int32", "int16", "int8", "float32", "float64" (beeing the template for the other distict type iterators using go:generate), "string", "byte" and an implementation using the empty interface{} (type iterableIf) for slices of any other type and for map operations on iterators of mixed type.

Whilst the first two are straightforward and stable to be used with functions to manipulate the emitted iterator elements, caution is needed to meet Go's type checking requirements with type iterableIf.

NOTE: Any the abstraction costs speed in comparison to a generic slice operations. Unless you need a yield-like behaviour for your programming logic applying a map function to slice elements in a for loop is usually (about 10times) faster than using the iterable MapNext function(ality). Anyway have fun!


Installation

go get -u github.com/AndreasBriese/itertools

Import to your code as usual or with a shorter alias.

import iter "github.com/AndreasBriese/itertools"

Functionality

itertools implements iterators for slices of the following types <t>:

[]float64        IterableFloat64        (Template for go:generate in makeMoreItertools.go)

[]int            IterableInt
[]int64          IterableInt64
[]int32          IterableInt32
[]int16          IterableInt16
[]int8           IterableInt8
[]float32        IterableFloat32
[]string         IterableString
[]byte           IterableByte

[]interface{}    IterableIf

typed Iterable<T> (with front capital letter in <T> like IterableInt64) which are constructed by ToIter<T> (with front capital letter i.e. ToIterInt64([]int{}).

Zipping two slices []<t> AAAA and BBBB to a new iterable of type Iterable<T> ABABABAB can be done with ZipToIter<T>(A, B []<t>).

ChainToIterIf(...[]<t>) takes at least 2 slices and returns an newly created iterable over their concat.

MMapToIter<T> maps multiple slices []<t> with a mapping function to an Iterable<T> over a new slice with the results (Note! This does not work on IterableIf). MMapIter<T> maps multiple slices []<t> with a mapping function and yields the result stepwise.

Iterable<T> Functions

Functions that return stepwise elements without changing the underlying original slice (don't need additional memory):

Next, This, Back, First, Last func() <T>
Cycle                         func() func() <T>
MapNext                       func(func(<T>) <T>) func() (<T>, bool)
FilterNext                    func(func(<T>) bool) func() (<T>, bool)
DoubleOpNext                  func(func(<T>, <T>) <T>) func() (<T>, bool)
DoubleCompNext                func(func(<T>, <T>) bool) func() (<T>, bool)
PairOpNext                    func(func(<T>, <T>) <T>, ...int) func() (<T>, bool)

Functions that return a new iterable without changing the underlying original slice - these need additional memory for the underlying slices:

PairOp           func(func(<T>, <T>) <T>, ...int) *Iterable<T>
DoubleOp         func(func(<T>, <T>) <T>) *Iterable<T>
DoubleComp       func(func(<T>, <T>) bool) *Iterable<T>
Filter           func(func(<T>) bool) *Iterable<T>
Map              func(func(<T>) <T>) *Iterable<T>

Functions that return the very same iterable with changes to the underlying original slice - no additional memory needed:

MapInto          func(func(<T>) <T>) *Iterable<T>

Function that returns iterables over the underlying origanal slice - ~~ little additional memory needed:

Tee              func(int) []*Iterable<T>

Infos & setters about state, that do not change the underlying original slice:

Len           int
Index         func() int
SetIndex      func(int) (int, bool)
Reset, ToEnd  func()

converters & destroyer of the iterator:

List   func() []<T> // returns the underlying slice but leaves the iterator intact
ToList   func() []<T> // returns a copy of underlying slice but leaves the iterator intact, uses additional memory
Reduce   func(func(<T>, <T>) <T>) <T> // leaves the iterator intact
Destroy  func() // replaces the underlying slice by an empty slice

Godoc provides basic documentation and you might lookup the samples for usage and comparison.

Some short notes about "specialties"

1.Why having three variants of Map Iterable<T>.Map() | .MapInto() | .MapNext() ?

Both, Iterable<T>.Map(fn) and .MapInto(fn), calculate the fn() over the iterable at once and return an Iterable<T> with the resulting elements.

Usage will depend on the use case:

If the original data in the underlying slice can be disposed, then use the faster .MapInto(). The bigger the data the larger the speed difference will be. Iterable<T>.Map() needs additional memory but does not change the underlying slice.

l := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
seq := iter.ToIterInt(l)
fmt.Printf("Result: %v slice l: %v\n", seq.Map(func(x int) int { return 2 * x }).ToList(), l)
fmt.Printf("Result: %v slice l: %v\n", seq.MapInto(func(x int) int { return 2 * x }).ToList(), l)

// prints
// Result: [0 2 4 6 8 10 12 14 16 18] slice l: [0 1 2 3 4 5 6 7 8 9]
// Result: [0 2 4 6 8 10 12 14 16 18] slice l: [0 2 4 6 8 10 12 14 16 18]

Iterable<T>.MapNext() is slower than the others because the results can not produced by a loop over the iterable's slice s. Anyway this can be handy if you actually want to change the elements of s dynamically in between the calculation steps or like apply fn repeatedly conditionally by storing its results within s and using .Back(). s is defined for the functions scope with ToIter<T>([]<t>) and might also be changed in the callers context. Anyway: s's data can but it's length and capacity can not be changed (or more acurately: these can be changed i the callers context but the changes do not apply to the Iterables scope).

2.Iterable<T>.PairOp(fn, step) & iter.PairOpNext(fn, step)

Applies the function fn in the call to a pair of sequential elements (previous, actual) starting with elem[1] and jumps by step to the next pair. Length of the returned iterable depends on step. That means:

  1. iter.PairOp(fn, step=1) runs over the iterable with fn(elem[0], elem[1]), fn(elem[1], elem[2]) ... len(result) = len(iterable)-1

  2. iter.PairOp(fn, step=2) runs over the iterable with fn(elem[0], elem[1]), fn(elem[2], elem[3]) ... len(result) = len(iterable)/2

With step = 2 it can be used together with the ZipToIter<T>() function. With large data (and multiprocessor use), operating over the zip iterable has the benefit of looping over one slice instead of calling elements from two different slices at different memory locations. There is an example of PairOp in the sample directory.

3.Iterable<T>.DoubleComp(fn) & .DoubleCompNext(fn)

condFn := func(prev, actual interface{}) bool {
        return (prev.(string) != actual.(string))
}
s1 := []string{"A", "B", "T", "T", "A", "Su", "Su", "E", "Roy", "A", "A"}
fmt.Println("no doubles:", iter.ToIterIf(s1).DoubleComp(condFn).ToList())
// prints 'no doubles: [B T A Su E Roy A]'

s1 = []string{"A", "A", "B", "T", "T", "A", "Su", "Su", "E", "Roy", "A"}
fmt.Println("no doubles:", iter.ToIterIf(s1).DoubleComp(condFn).ToList())
// prints 'no doubles: [B T A Su E Roy A]'

Against intuition but accurately this prints [B T A Su E Roy A] because the DoubeComp* functions compares the actual and previous element and returns in case the actual element. Consequently the first element in the iterable can not be compared to previous nor returned. If you need it, call Iterable<T>.First().

Benchmarks (go test -bench=. cpu=1)

Some Benchmarks IterableInt []int (slice length 1.000.000)

Benchmark_IterInt_ToIterInt                      1000000          1956 ns/op
Benchmark_IterInt_ToList                             100       5871707 ns/op
Benchmark_IterInt_List                         500000000             2.07 ns/op
Benchmark_IterInt_MapInto                            500       2816308 ns/op
Benchmark_IterInt_Map                                200       6779983 ns/op
Benchmark_IterInt_MapNext                            200      10156138 ns/op
Benchmark_IterInt_Filter                             100      14037458 ns/op
Benchmark_IterInt_FilterNext                         100      15868025 ns/op
Benchmark_IterInt_Reduce                             300       4018775 ns/op
Benchmark_IterInt_Map_Filter_Reduce                  100      19697933 ns/op
Benchmark_IterInt_MapInto_Filter_Reduce              100      16282103 ns/op
Benchmark_IterInt_PairOp                             200       8981810 ns/op
Benchmark_IterInt_DoubleOp                           100      11818775 ns/op
Benchmark_IterInt_DoubleComp                         100      19677501 ns/op
Benchmark_IterInt_Tee5                              2000        741668 ns/op

Benchmark_IterIf_ToIterIf                             50      33606582 ns/op

Run go test -bench=. cpu=1 to get further benchmarks. Other types will behave more-or-less similar, but IterableIf is slower to build the Iterable with its underlying []interface{} (i.e. ToIterIf([]uint8)) and in (other) cases where typecasting is involved (i.e. in mapping and filter functions).

Comparison slice []int vs. diverse iterables (slice length 1.000.000)

LoopOverIntSlice is gold standard (using native go slice for ops)

Benchmark_Int_Sum_LoopOverIntSlice                  2000        702820 ns/op
Benchmark_IterInt_Sum_NextOverIterable               200       8659626 ns/op
Benchmark_Yanatan_Sum_NextOverIterable                 5     219389515 ns/op

mapFunction f(x) = 2*x
Benchmark_Int_Map_LoopOverIntSlice                  2000        662458 ns/op
Benchmark_IterInt_Map_OverIterable                   500       3615911 ns/op
Benchmark_IterInt_Map_IntoOverIterable               500       3073908 ns/op
Benchmark_IterInt_Map_NextOverIterable               100      11960226 ns/op
Benchmark_Yanatan_Map_NextOverIterable                 3     441119057 ns/op

(Yanatan refers to the iterables using channels: github.com/yanatan16/itertools) See https://github.com/ewencp/golang-iterators-benchmark for more comparisons of different implementations.

Benchmarks on Apple MBP 2013 with i7 and 8GB Ram

License

(C)opyright 2018 Andreas Briese, eduToolbox@Bri-C GmbH, Sarstedt, with MIT license - see the headers