pytoolz/toolz

iget for fetching indexes from a non-sequence iterable

groutr opened this issue · 4 comments

It would be useful, I think, to have a version of itertoolz.get that works on iterables that don't support indexing (ie sets, iterators, etc)

def iget(ind, seq):
    seq = iter(seq)
    j = 0
    for i in sorted(ind):
        if j < i:
            seq = drop(i - j, seq)
            j = i
        if j == i:
            yield next(seq)
            j += 1

With sets now ordered, it makes sense to pull out the nth item in order of insertion, or the nth line of a file

with open('file1.txt') as fin:
    lines = tuple([2, 5, 8, 9], fin)

Thanks @groutr. This has come up before: #97

I'm up for adding this functionality in some way.

After giving this some more thought I find that extending nth is a more appealing to me. It does require a mental shift from providing absolute indices (from the start of the iterator) to providing relative indices counting from the current position of the iterator.

My take on extending nth

def nth(n, seq):
    """ The nth element in a sequence

    >>> nth(1, 'ABC')
    'B'
    """
    seq = iter(seq)
    if not isinstance(n, Sequence):
        n = (n,)
    for i in n:
        seq = drop(i, seq)
        yield next(seq)

If I want indices 1 and 2 from 'ABC', that would be "return 1st element, then return 0th element".

>>> tuple(nth([1, 0], 'ABC'))
('B', 'C')

wouldn't it be nicer to first calculate the difference between all indices and use that to determine how many to drop?

def lazy(f):
    yield f

@curry
def unpack_args(f, args):
    return f(*args)


def nth(n, seq):
    """ The nth element in a sequence

    >>> nth(1, 'ABC')
    'B'
    """
    from operator import sub
    seq = iter(seq)
    if not isinstance(n, Sequence):
        n = (n,)
    else:
        sub1 = lambda x: x-1
        skip_n = compose(map(compose(sub1, unpack_args(sub), reversed)), sliding_window(2))
        n = concat((lazy(first(n)), skip_n(n)))
    for i in n:
        seq = drop(i, seq)
        yield next(seq)

This doesn't require the mental workout to get the correct differences (especially skipping zero if you want consecutive items). Also, if one wants to change an index it is a lot less error prone this way.

>>>list(nth([1,2,5,8], range(100)))
[1,2,5,8]

A major restriction of both methods is that it can only take items in increasing order:

>>>list(nth([1,2,1], "abcdefghijklmnopqrstuvwxyz"))
[...]
ValueError: Indices for islice() must be None or an integer: 0 <= x <= sys.maxsize.

Perhaps iterating over seq in sorted order and then reiterate in the requested order would be more stable.

A more stable approach would be like this, although it is a lot more ugly and iterates over n multiple times:

def nth(n, seq):
    """ The nth element in a sequence

    >>> nth(1, 'ABC')
    'B'
    """
    seq = iter(seq)
    if not isinstance(n, Sequence):
        n = (n,)
    else:
        sub1 = lambda x: x-1
        skip_n = compose(map(compose(sub1, unpack_args(sub), reversed)), sliding_window(2))
        orig_order, n = zip(*sorted(enumerate([1,2,1]), key=lambda x:x[1]))
        n = concat((lazy(first(n)), skip_n(n)))
    output = []
    for o, i in zip(orig_order, n):
        if i == -1:
            output.append((o, value))
            continue
        seq = drop(i, seq)
        value = next(seq)
        output.append((o, value))
    for _, value in sorted(output, key=lambda x: x[0]):
        yield value

Using duplicated, not monotonic increasing indices works as expected:

>>>list(nth([1,2,1], "abcdefghijklmnopqrstuvwxyz"))
['b', 'c', 'b']