eclipse-archived/ceylon

Polymorphic operator support for creating ranges

Closed this issue · 48 comments

[@luolong] Currently as far as I understand, Range operators .. and : are implemented by invoking Range constructor which basically takes two Ordinal arguments and creates an appropriate Range object.

While working on date/time library for Ceylon I've come up with use cases where it would make sense to create ranges of the circularily referenced discreet set of ordinal values that wrap around either end of the discreet set.

For example given weekdays fith following successor relationships: sunday -> monday -> tuesday -> ... -> saturday -> sunday

It would make huge sense to be able to declare ranges like monday .. sunday or sunday .. saturday.

The trouble is that the way Range is currently implemented, I can not define such ranges whose values are not strictly speaking comparable (monotonously growing).
Without that implementing operations like in would be hugely inefficient.

On the other hand when implementing types like DayOfWeek, I can create a less generic implementation that can cope with such fragmented ranges and provide more optimized version of Range, while making creation of such ranges syntactically much more natural.

So, what I propose is to redefine range creation operators so that they would call a method on the implementation of Ordinal

For example:

  • monday .. sunday would be redefined as monday.rangeTo(sunday)
  • monday:4 would be redefined as monday.rangeOf(4)

In addition to making my use case a bit nicer to use, it would also make range creation operators more regular in respect to other polymorphically defined operators like +, -, *, etc.

[Migrated from ceylon/ceylon-spec#451]
[Closed at 2014-06-21 21:37:58]

[@FroMage] It would also make monday.rangeOf(14) work, to get two weeks.

[@luolong] Yes, that too, although I must admit, I didn't think of it at first...

[@gavinking] I think this change makes sense. We should add spanTo() and segment() to Ordinal. They would return [Other+] and [Other*]. I will work on this next.

[@gavinking] Damn, ran into a little problem with this one: Ordinal is supposed to be covariant, which means the signature of spanTo() is wrong. Of course I can use the trick I use on Set. Or I could add a whole new (invariant) type. Don't love either option...

[@gavinking] Slipping from M6.

[@gavinking] @luolong + @DiegoCoronel Is this something we have to do for 1.0, or can we come back to it after that? I'm trying to clear the deck of issues that are blocking the release of 1.0 final.

[@DiegoCoronel] Well, I see it as a plus in the language and that we could have benefits in ceylon.time api and currently i see only this specific behavior below, at least of course if @luolong has some idea of use that we can improve in our current implementation.

For example this code:

for( week in saturday..sunday ) {
    print( week );
}

results in:

saturday
friday
thursday
wednesday
tuesday
monday
sunday

this is the current behavior of the language but i think maybe it does not make sense for our case because in my opinion i would not expect that weeks come in the inverse order, so if we launch ceylon final version with this behavior we shouldnt change it anymore. I know you have lot of work to do but i would like to see it for 1.0

[@DiegoCoronel] It means im thinking that with both ideasrangeTo and rangeOf we are going to be able to provide the iterator, is it?

[@gavinking] Sorry?

Sent from my iPhone

On 28 Sep 2013, at 4:19 pm, Diego Coronel notifications@github.com wrote:

It means im thinking that with both ideasrangeTo and rangeOf we are going to be able to provide the iterator, is it?


Reply to this email directly or view it on GitHub.

[@DiegoCoronel] maybe i misunderstand, but at my reading i was thinking that those ideas were ways to we refine the behavior of monday..sunday and monday:4...

[@gavinking] All it really means is that in 1.0 you would write:

saturday to sunday

Using the infix method invocation syntax.

And in a future release of the language we can come back and make DayOfWeek implement Rangeable or whatever. I know it would be nice to write .. instead of to, but really I think we can maybe live without it for a couple of months...

[@gavinking] I'm going to reschedule this for 1.2, unless I hear some screaming from you guys that this is something super-important.

[@luolong] Yes, given the now formally introduced infix operator syntax support, the absence of this feature is now much less noticeable.

[@tombentley] @gavinking are you working on this? If not I can look at it.

[@gavinking] Well I was planning to but not sure if I will get much time in the next week. So let me give you an idea of what I've been thinking when I get back home.

Sent from my iPhone

On 21 May 2014, at 2:12 pm, Tom Bentley notifications@github.com wrote:

@gavinking are you working on this? If not I can look at it.


Reply to this email directly or view it on GitHub.

[@tombentley] I'm working on the idea that we add span() and segment() methods to Ordinal (which the .. and : operators will be syntax sugar for).

Ideally we'd like to drop the given Element satisfies Comparable<Element> constraint from Range, because there are things (such as DayOfWeek) for which the syntax sugar would be natural (mon:7) but which don't have a total order (whether sun<mon depends on whether you think the week starts of a Monday or Sunday). However we can't remove the constraint because unless we know whether start > end we don't know whether the range is generated via successor or predecessor. In other words without the Comparable constraint the contract for span is really weak and impossible to document.

Note that this isn't a problem for segment/: because these are always increasing/generated using successor.

Another thing to note about the DayOfWeek example is that wed:14 generates a perfectly good sequence, though it's not actually a range in the usual sense of the word (it's not even a repeated range). For this reason I think that segment()/: should return a Sequence or an Iterable

Getting back to span(): We could constrain all Ordinals to be Comparable ("if x.successor==y then x<y"). The problem with that is it doesn't deal with circularity (maxInt.successor == minInt would mean that maxInt<minInt!). So we would have to define it the other way around ("if x<y then after some number of iterations (x.successor) == y") which is weaker but gets us out of the circularity hole.

So what I think makes sense and is sufficiently general looks like this:

Firstly we define a supertype of Ordinal, let's call it successor:

"Abstract supertype of types whose values may have a successor.

 It is recommended for implementations that satisfy 
 Comparable that `x<y` implies  `x.successors.contains(y)`
 "
see(`interface Ord`)
interface Successor<Other,Absent=Null>
        satisfies Iterable<Other,Absent>
        given Other satisfies Successor<Other, Absent> 
        given Absent satisfies Null {

    "The successor of this value, or null if there is no successor"
    shared formal Other|Absent successor;

    "This value and all its successors"
    shared default {Other*} successors {
        value start = this;
        object result satisfies Iterable<Other, Null> {
            shared actual Iterator<Other> iterator() {
                object iter satisfies Iterator<Other> {
                    variable Successor<Other,Absent>|Null current = start;
                    shared actual Other|Finished next() {
                        if (is Successor<Other,Absent> f=current) {
                            value result = f;
                            current = result.successor;
                            assert(is Other result); 
                            return result;
                        } else {
                            return finished;
                        }
                    }
                }
                return iter;
            }
        }
        return result;
    }

    "This value and the next [[num]] successors"
    shared default {Other*} segment(Integer num) {
        return successors.take(num);
    }
}

The segment operator (:) would be defined in terms of segment. Note that this definition leaves it up to the implementor whether every value has a successor, or whether some value(s) are terminal (e.g. you might want to model MonthOfYear as non circular, so that december.successor is null, but make DayOfWeek circular; this would make january:13 be of length 12, but mon:14 be of length 14).

Then we can define Ordinal like this:

"Specialization of Successor for types which are also Comparable.
 For two values `x` and `y` if `x<y` then `x.successors.contains(y)`
 Also `x.successor.predecessor == x`
 "
interface Ordinal<Other,Absent=Null> of Other
    satisfies Successor<Other,Absent> & Comparable<Other>
    given Other satisfies Ordinal<Other,Absent>
    given Absent satisfies Null {

    "The predecessor of this value."
    shared formal Other|Absent predecessor;

    shared formal Range<Other,Absent> span(Other to);
}

I guess we'd define the ++, --, += and -=operators onOrdinal(even though the incrementing ones could be defined in terms ofSuccessor`).

Range could remain mostly as it is now, though next() would be default and probably include getting the nth next.

Integer and would satisfy Ordinal<Integer,Nothing> (because we allow maxInt++ to return an Integer even if it has overflowed or lost precision), similarly Character.

[@luolong]

However we can't remove the constraint because unless we know whether start > end we don't know whether the range is generated via successor or predecessor.

To be honest, I've always been a little suspicious about this feature. I can see how it might be considered useful, but I am perfectly okay with (1..10).reverse instead of 10..1 form.

[@luolong] Other than my previous comment, how would your proposal affect performance of a range operator?

Specially, when considering well-known and most likely cases like Integer ranges...

[@gavinking] If there's no ordering, how would you even know there is anything wrong with 10..1?

Sent from my iPhone

On 26 May 2014, at 9:21 am, Roland Tepp notifications@github.com wrote:

However we can't remove the constraint because unless we know whether start > end we don't know whether the range is generated via successor or predecessor.

To be honest, I've always been a little suspicious about this feature. I can see how it might be considered useful, but I am perfectly okay with (1..10).reverse instead of 10..1 form.


Reply to this email directly or view it on GitHub.

[@luolong] yea, you've got a point there...

[@luolong]

If there's no ordering, how would you even know there is anything wrong with 10..1?

On the other hand, does it have to?
What would you consider to be the correct set of values returned by this expression: 10..13 step 4?

My point is - at some point the range will iterate over it's ending point and will have to terminate the iteration. In the case of 10..1, I would expect the range to return one element: i.e. {10}

[@tombentley]

In the case of 10..1, I would expect the range to return one element: i.e. {10}

To me x..y means start at x and keep incrementing or decrementing until you get to y. Similarly x:z means start at x and keep incrementing until you've got z elements.

Without using the ordering (i.e. without supporting decreasing ranges), I agree with Gavin that 10..1 doesn't make sense. I certainly don't think {10} is an intuitive result in that situation.

[@tombentley] What I suggested above mostly works out fine. The only change I've had to make is that Ordinal<Element> satisfies Successor<Element, Nothing>, rather than having an extra Absent type argument to support non-terminating Ordinals. The main reason for that is that Range.shifted() doesn't really work nicely when you have a terminating Ordinal (it would be weird for a shifted Range to be empty!).

I'm wondering what @gavinking makes of it all.

[@gavinking] @tombentley I just don't see how this proposal really even addresses the main problem with Ordinal as it exists today which is that many operations of Range just aren't efficient unless the type is also Enumerable. Sure, sure, you've given Ordinal a formal span() operation, so, indeed, I can go ahead and create my own Range type. But then WTF is the point of all this complexity of Successor, Ordinal, etc if at the end of the day you're not even using them for anything!? Again, they just don't seem like very powerful abstractions at all, especially since they fail to abstract over the main problem we're interested in here!

[@gavinking] What I'm saying is that as presented, I don't see how this is an improvement over what we have today. The proposal doesn't really make Enumerable obsolete. It just introduces additional complexity where we already have too much.

[@tombentley]
@gavinking well, imo the problem with Ordinal isn't just that it's inefficient, it's also that:

  • It assumes Ordinals must be Comparable, and I think people want to use the language features which are defined in terms of Ordinal and/or Range with types which are not Comparable (like DayOfWeek).
  • It assumes that Ordinals are either infinite or circular (i.e. that you can always get a successor and predecessor), and again I think people want to use related language features with types which are finite and non-circular (e.g. a Priority).

So you're right that Successor isn't relevant to solving the efficiency problem of Ordinal -- we could just have kept successor on Ordinal and defined segment() on Ordinal to achieve that, and done without this whole Absent thing. But Successor is trying to deal with those other problems where Ordinal is too specialized to be useful in reasonable seeming circumstances. I'm aware, however, that I've thus far failed to convince you that those are problems with Ordinal.

[@gavinking]
@tombentley

  • Ordinal doesn't inherit Comparable. it's only Range that requires that, and it would still require it, AFAICT.
  • I'm not at all convinced that successor and predecessor should be optional. I can't imagine what you think the semantics and type of ++ and -- should be in that case.

So I think maybe this is addressing the wrong problem.

[@tombentley]

  • Although Ordinal doesn't inherit Comparable it does induce a total order, so saying it doesn't inherit Comparable is splitting hairs isn't it?
  • I said that ++ and -- would still be defined on Ordinal and later said that Ordinal should satisfy Successor<Other, Nothing> so you don't have to do anything about ++, or -- AFAICS because the successor/predecessor which they're defined in terms of are not optional.

[@tombentley] For the record, this is what I have at the moment:

"Abstract supertype of types whose values may have a [[successor]]. 
 This is a generalization of [[Ordinal]].

 The type `Successor<Element,Nothing>` represents a non-terminating 
 successor type: It is always possible to get a next element. Such types 
 include true infinite successor types ([[successors]] contains unique elements)
 as well as circular successor types ([[successors]] ends up repeating the 
 same sequence of elements).

 The type `Successor<Element,Null>` represents terminating successor types 
 which have a _terminal_ element which lacks a successor. 

 A `Successor` instance may be used to generate a [[Sequence]], using the 
 segment (`:`) operators. The [[segment]] operator `:` accepts the 
 first element and maximum length of the subrange.

     0:5     // [0, 1, 2, 3, 4]

 For a terminating successor type the segment will have fewer elements than 
 the requested length if the terminal element was reached before the given 
 number of elements was reached.

 If the length is nonpositive, the subrange is empty.

     0:0     // []
     5:0     // []
     0:-5    // []
 "
see(`interface Ordinal`)
shared interface Successor<Other,Absent=Nothing>
        given Other satisfies Successor<Other, Absent> 
        given Absent satisfies Null {

    "The successor of this value, or null if there is no successor"
    shared formal Other|Absent successor;

    "This value and all its successors"
    shared default {Other+} successors {
        value start = this;
        object iterable satisfies Iterable<Other, Nothing> {
            shared actual Iterator<Other> iterator() {
                object iter satisfies Iterator<Other> {
                    variable Successor<Other,Absent>? current = start;
                    shared actual Other|Finished next() {
                        if (exists result=current) {
                            current = result.successor;
                            assert(is Other result); 
                            return result;
                        } else {
                            return finished;
                        }
                    }
                }
                return iter;
            }
        }
        return iterable;
    }

    "This value and the next [[num]] successors"
    shared default {Other*} segment(Integer num) {
        return successors.take(num);
    }
}
"Specialization of [[Successor]] for types which are also [[Comparable]].
 This includes [[Integer]] and other [[Integral]] numeric types.
 [[Character]] is also considered an ordinal type.

 Given any two values `x` and `y` of `Ordinal` type, where `x<y` then 
 `x.successors.contains(y)` (in other words, starting from a smaller value, 
 we can get to any larger value by iteratively getting successors some 
 number of times).

 Ordinal also comes with an inverse of successor called predecessor. It it 
 expected that given any value `x` of `Ordinal` type, 

     x.successor.predecessor == x

 is always true

 The _increment_ operator `++` and _decrement_ operator `--`
 are defined for all `Ordinal` types.

     function increment() {
         count++;
     }

 An `Ordinal` instance may be used 
 to generate a [[Range]], using the span (`..`) operator. The [[span]] 
 operator `..` accepts the first and last values 
 of the range.

     0..5    // [0, 1, 2, 3, 4, 5]
     0..0    // [0]

 If the last value is smaller than the first value, the
 range is reversed.

     5..0    // [5, 4, 3, 2, 1, 0]
     0..-5   // [0, -1, -2, -3, -4, -5]
 "
see (`interface Successor`,
     `class Character`, 
     `class Integer`, 
     `interface Integral`, 
     `class Range`)
by ("Gavin", "Tom")
shared interface Ordinal<Other> of Other
        satisfies Successor<Other,Nothing> & Comparable<Other>
        given Other satisfies Ordinal<Other> {

    shared formal actual Other successor; 

    "The predecessor of this value."
    shared formal Other predecessor;

    "The instances between this instance and the 
     given [[end]] instance, inclusive.
     If the implementor also satisfies [[Comparable]] 
     the elements of the sequence should be ordered 
     (increasing if `end > this` or decreasing if `end < this`)."
    shared default Range<Other> span(Other end) {
        return Range(this of Other, end);
    }

    //"The [[next|successor] [[length]] instances, 
    // or empty if `length` is non-positive."
    //shared formal Range<Other>|[] segment(Integer length);

}

[@gavinking] It's an order but not a total order, I think.

[@gavinking] So the thing with this issue is that since #5076 is back in play, I need a Range class that is sufficiently general to abstract over all kinds of ranges we might be interested in. So I started asking myself about how to make the most completely abstracted Range I could come up with.

Inspecting the current implementation of Range, it appears to need the following:

  • the first and last items,
  • a compare() function,
  • an offset() function, and
  • a neighbour() function.

UPDATE: in fact the compare() function could in theory be expressed in terms of offset() though that might affect performance.

Now, we could stick those operations on an interface (Enumerable, it's called today) but that has the disadvantage that you can only have one sort of range for each element type. For example, (1..10).by(2) can't be a Range today. So at first it seemed to me better to just change the constructor of Range to class Range<E>(E first, E last, Comparison compare(E x, E y), Integer offset(E x, E y), E neighbour(E e, Integer offset)).

But then I realized that optimizing Range.equals() and Range.includes() would be impossible because functions don't have equality. Those seem like quite useful optimizations to me. So I find myself back at the notion of something conceptually similar to the Enumerable interface (though perhaps we could consider at some point giving Range a step).

So at this point, I'm inclined toward just unlinking Range from Ordinal, and defining:

shared interface Enumerable<out Other> of Other
        satisfies Comparable<Other> 
        given Other satisfies Enumerable<Other> {

    shared formal Integer offset(Other other);
    shared formal Other neighbour(Integer offset);

}

Perhaps you @tombentley could convince me that it should be:

shared interface Enumerable<out Other> of Other
        given Other satisfies Enumerable<Other> {

    shared formal Integer? offset(Other other);
    shared formal Other? neighbour(Integer offset);

}

in order to accommodate discontinuities.

(Now, FTR, if ranges were merely iterables, rather than sequences, you could form a range with only first, last, and a next() function, which would be a lot simpler. But of course plain iterables are much less useful.)

[@luolong] Now, I'm curious -- why does a Range need to be a Sequence? Why isn't just Iterable enough?

[@gavinking] Well that's equivalent to asking "why have sequences at all?". If sequences are useful, then more efficient sequences are useful.

[@luolong] No, not quite the same question.

I was asking essentially, "What use case requires Range to be a sequence"?

But you are right of course -- making Range (and Segment) a sequence will allow it to be used wherever a sequence is expected, enabling a whole range of possibilities (pun fully intended)

[@luolong] A question though.

You have this signature in your Enumerable interface:

    shared formal Other? neighbour(Other other, Integer offset);

What is the Other other doing there?

[@gavinking]

What is the Other other doing there?

Sorry, that's a mistake.

[@tombentley] I'm going to make the same point at I have for a while now, just in terms of your new abstractions, so please forgive me banging on about the same stuff...

shared interface Enumerable<out Other> of Other
        satisfies Comparable<Other> 
        given Other satisfies Enumerable<Other> {
    shared formal Integer offset(Other other);
    shared formal Other neighbour(Integer offset);
}

So as you alluded to in your "UPDATE:" offset() implies a total order between Enumerables of the same type. It follows that we wouldn't be able to have Ranges of DayOfWeek. I'm not sure how well this copes with a type like Priority either, because neighbour() is supposed to return an Other for any arbitrary Integer, so how is an Enumerable type with fewer inhabitants than Integer supposed to work? It seems the only thing is can do is to wrap around, which isn't always a suitable semantic.

So I tend to prefer the one where neighbour() can return null. However, how will we construct a Range when start.offset(end) returns null?

And it seems to me that this cannot handle a Range<Whole>.

[@gavinking] So I have implemented the proposed changes to Enumerable and Range that allow them to abstract over:

  • "recursive" enumerable types like DayOfWeek, and
  • high-precision integral types like Whole.

This solution, in contrast to other options we considered, has the advantage that it accommodates a wide range of enumerable types, without closing off the possibility to make Sequence an enumerated type, as proposed in #5076.

The approach basically worked out, though the implementation of Range is surprisingly hairy.

If everyone agrees that this is the way to go, then the remaining tasks are:

  • #5362
  • tiny update to the spec
  • tests for ranges over recursive enumerated types

[@tombentley] I wonder if we could write the termination condition for the iterator() slightly differently, because while what you currently have certainly works, the different behaviour for recursive and non-recurvsive Ranges makes loops harder to optimize.

It seems to me that we want to continue the iteration until current.offset(last) == 0 and then do one more (because a Range includes its endpoint). We could do that like this:

    object iterator 
            satisfies Iterator<Element> {
        variable value current = first;
        variable value latch = 1;
        shared actual Element|Finished next() {
            if ((latch > 0 && current.offset(last) != 0) || latch-- > 0) {
                value result = current;
                current = outer.next(current);
                return result;
            }
            else {
                return finished;
            }
        }
        string => outer.string + ".iterator()";
    }

This condition would work for both recursive and non-recursive Ranges and would permit an optimized loop something like this:

Integer last = range.getLast();
int latch = 1;
for (Integer curr = range.getFirst();
        (latch > 0 && curr.offset(last) != 0) || latch-- > 0;
        curr = range.getIncreasing() ? curr.getSuccessor() : curr.getPredecessor()) {
    // ...
}

[@gavinking] I guess we could do that. It does open up more possibility of the loop never terminating for a badly-behaving Enumerable, but I guess that's no big deal...

[@tombentley] With the iterator() implemented as above, an unoptimized range loop (i.e. using Range's iterator directly) runs faster (13484.267541ms before cf 11850.233843ms after). The optimized version runs about the same, maybe slightly slower (9696.06184ms before cf 9965.891885ms after), and is still an optimization, though not as noticeable since it's now only ~10% faster.

[@gavinking] OK, well that's certainly something.

[@tombentley] If I write the Range.iterator() slightly differently again it appears I can make it run a little faster still, which means the optimized loop is now only about 5% faster than the normal iterator loop. At that point I start thinking that that particular optimization is not worth the complexity in the compiler. WDYT?

[@gavinking] 5% is not much. But do we know for sure that 5% is an upper bound?

[@tombentley] You mean you want numbers for different types of element?

[@gavinking] It was more of a rhetorical qn.

[@gavinking] Closing, after updating spec and opening #5369.