JuliaLang/julia

NegatedIndex type

StefanKarpinski opened this issue · 31 comments

Idea spawned by this discussion. The idea is that just as writing v[b], v[!b] where b is a logical indexing vector partitions v into two disjoint sets of elements, writing v[x], x[!x] should also partition v into two disjoint sets of elements when x is a single index or a range of indices. This requires some new NegatedIndex type, together with ! methods for integers and ranges and the appropriate method for indexing arrays with NegatedIndex objects.

What does it do on integer vectors?

I'm basically thinking that the structure looks like this:

type NegatedIndex{T}
    idx::T
end

So you just keep the original object until its time to do the actual indexing, at which point you know what you're "subtracting" the index object from, which lets you compute the actual index set and then do regular indexing with that. So ![2,5] would produce NegatedIndex([2,4]). In v[![2,5]], the indexing operations would compute [1:length(v)]-[2,5] as [1,3,5,...,length(v)] (supposing length(v) ≥ 5) and return [v[1],v[3],v[5],...,v[end]]. Make sense?

We do not define ! on non-booleans. That makes it possible for us to give it a different meaning, at least for Index types. But, ~ in matlab does have a defined behaviour, and that could lead to confusion.

That said, I do like the idea, and here are some thoughts.

ComplementIndex is an alternative name. We could alternatively allow \ to be used as a unary operator, to mirror the set complement operator.

A complement operator may even be useful outside of indexing, like in loop iterations and such.

Looking at the names in http://en.wikipedia.org/wiki/Complement_(set_theory), we could even just have a functional interface, using except, which returns the new Index type.

I really think that the ! operator makes sense in this case, although I would be ok with writing except(k) as well. Actually, calling the type Except would make a fair amount of sense.

ExceptIndex would be clearer for the type name.

-viral

On 12-Jul-2012, at 11:40 PM, Stefan Karpinski wrote:

I really think that the ! operator makes sense in this case, although I would be ok with writing except(k) as well. Actually, calling the type Except would make a fair amount of sense.


Reply to this email directly or view it on GitHub:
#1032 (comment)

I think it seems a bit fishy to define v[!integer_array]:
would v[![1,2]] == v[![2,1]] == v[![1,2,1]] == v[3] == 3 for v=[1,2,3]?
The trouble seems to be to negate a sequence (array) rather than a set.
Perhaps better to allow v[int_set] and v[!int_set] for subsetting, (and keep v[!int_index])?

It is a little weird, but I was thinking that you resolve, e.g. NegatedIndex([2,1]) when indexed into an indexing slot with range 1:n by iterating 1:n and then "emitting" the index value if and only if !contains([2,1],i). Does that make sense?

I'm a bit worried about surprises that might come out of this. E g you might expect a[!inds] to always give you a subsequence, but what if someone passed an inds::NegatedIndex argument into your function?

The way interpret it, !index is a set complement operation, with sets represented as strictly increasing sequences.
Perhaps one could refuse to create/use a NegatedIndex from a sequence that is not strictly increasing?
Then common cases like a[!2:5] would work, ! would be its own inverse, and

x, y = a[inds], a[!inds]

would always partition the elements of a.

This seems to have fizzled out, but I still really, really want this functionality to be added. It's one of the most convenient tricks in R for describing data resampling. See the jackknife for the simplest example of a statistical method in which this notation helps.

I just discovered that one of the other DataFrame developers hacked together an implementation of this style of indexing for the columns DataFrames simply because it was so unnatural to articulate one algorithm without it.

If I'm understanding this correctly, the negated index [!x] would strip the member whose position has the same value as the index [x], creating a vector slice with the member removed.

Can someone please explain the following:

"For example, the following creates a vector slice with the third member removed.![2,5] would produce NegatedIndex([2,4]) ...
and return [v[1],v[3],v[5],...,v[end]]."

Why wouldn't ![2,5] produce NegatedIndex([2,5]), and v[5] be removed instead of v[4]?? Is there something I'm misunderstanding?

Yes, that's absolutely right. I just made a bunch of mistakes jumbled together in that post :-)

I'm looking into this.
Quick question: how do i define the ! operator to reference either the original array or it's size to figure out which indices are not idx in a[!idx]?

You might not have to; define ! for the type of indexes you want to support (e.g., integers, ranges, and Vector{Int}) to produce a new type like NegatedIndex (or ComplementIndex). Then define the function getindex(a, idx::NegatedIndex).

Of course, you'll also need to look at setindex! and consider how to handle multidimensional arrays. You may find that you need to add more than a couple of function definitions :-).

We might have to change the indexing code to explicitly combine dimension sizes with indexes to allow implementing things like this (and Colon). For example instead of

for i in idx
  ...

we'd write

for i in fullindex(idx, size(A,n))
  ...

Jeff, can you elaborate on that? I'm failing to see the need for it – not that there isn't one, I'm just not seeing why.

Because there is no way to implement iteration of something like NegatedIndex([2]) without knowing the size of the indexed dimension.

Yes, obviously you need a context for that. The part I needed clarification of was "change the indexing code" – what indexing code are you referring to?

After looking at some code with @punkrockpolly, I'm wondering if the best approach here isn't to have a Complement{T} type that primarily provides an in method:

immutable Complement{T}
  collection::T
end

in(x,c::Complement) = !in(x,c.collection)

Then you can also do things like this:

getindex(v::AbstractVector, c::Complement) = v[filter!(i->in(i,c), 1:end)]

Of course, that doesn't cover everything, but I think the limited scope of the Complement abstraction clarifies things a bit.

I am closing this, since this is something that can easily live in a package. Please reopen if you think otherwise.

Well, if it was to be implemented, it would make much more sense in Base than in a package, since it would affect all indexing.

Would the recent changes in array indexing make this easier to implement efficiently for any number of dimensions? @timholy @mbauman?

I'd suggest putting this into SubArray as a valid index type---eventually A[negated(1:15,3), :] would return a SubArray anyway.

Well, #10525 would allow us to define this once and transform the complement to direct indices (or potentially even doing so without any intermediates).

One thing I realized the other day is that we almost have a negated index type already in base. It's the complement IntSet. If IntSets were not in the range 0:(typemax(Int)-1), but instead 1:typemax(Int), it'd actually simplify quite a bit of their implementation. And it'd make IntSet much more useful for indexing… especially if we start using BitArrays as their bit-storage.

Cool. But AFAIK the order of elements is not defined for IntSet, while for indexing it really matters.

It's documented as being sorted, which seems sensible.

Ah, and yes, my changes in #10331 for indexing with Colon fix Jeff's concerns above.

tpapp commented

Would #13157 allow "negated" indexes for array-like constructs with named indexes? (which map keys to indexes, like NamedArray, which already has its Not construct)

Not directly, no. Both SubArray and fallback non-scalar indexing have converged to simply re-index into the stored or passed indices. The difficulty with using IntSets and complement types as indices is that they do not directly support fast indexed access. I don't want more special cases, so that means doing what we do with logical vectors: transform them to normal index vectors after checking their bounds.

Perhaps the only action item for Base is to change the Base.to_index API to accept the array and index dimension so there's enough context to transform a negated index. That would allow this to live in packages and work uniformly. It still requires some funny business, though, since a NegatedIndex must be an AbstractArray to dispatch properly, but it'd really only implement checkbounds and to_index.

See https://github.com/mbauman/InvertedIndices.jl for an implementation of this idea as a package.