Review zero and null values for Ranges

Question

Review zero and null values for Ranges

librasteve opened this issue a year ago · 13 comments

Currently, v.6.e.PREVIEW, we have the following:

say so 0..0;   # True
say so 1..0;    # False (not well formed)
say so Nil..Nil;   # prior version was True, now errors...

Range cannot be iterated over because its starting point 'Any' does not
have a '.succ' method
  in block <unit> at <unknown file> line 1
  in any <main> at /Users/stephenroe/.rakubrew/versions/moar-2023.10/bin/../share/perl6/runtime/perl6.moarvm line 1
  in any <entry> at /Users/stephenroe/.rakubrew/versions/moar-2023.10/bin/../share/perl6/runtime/perl6.moarvm line 1

My (naive) assertions are that:

0..0 should be reserved as the (only) 'zero Range'
so that

say so 0..0;                     # False
say so (0..0 ~~ Range); # True

and that (if ever defined) a Range div Range operation could then fail with 0..0 as the denominator

and

Nil..Nil should be reserved as the (only) 'null Range'
so that

say so Nil..Nil;                    # False
say so (Nil..Nil ~~ Range); # True

Since 6.e.PREVIEW is making a breaking change to the behaviour of Nil..Nil, I would suggest we try to squeeze in the latter as a fix prior to closing 6.e production...?

See this SO question for some context:
https://stackoverflow.com/questions/77608044/is-there-an-elegant-way-in-raku-to-see-if-two-ranges-overlap

Answer 1 · 2023-12-11T10:31:02.000Z

Seems that use of Nil as an endpoint for Range is already problematic:

Welcome to Rakudo™ v2023.10.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2023.10.

Nil..0    #Nil..0

but

0..Nil

Use of Nil in numeric context
  in block <unit> at <unknown file> line 1
  in any <main> at /Users/stephenroe/.rakubrew/versions/moar-2023.10/bin/../share/perl6/runtime/perl6.moarvm line 1
  in any <entry> at /Users/stephenroe/.rakubrew/versions/moar-2023.10/bin/../share/perl6/runtime/perl6.moarvm line 1

So somewhere in the code there's an "implicit assumption" that Nil < 0

Answer 2 · 2023-12-11T10:38:02.000Z

So, a cleaner approach would be to define a (new) symbol for the null Range such as:

𝟘

𝟘
MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO
Unicode: U+1D7D8, UTF-8: F0 9D 9F 98

with texas variant like

Range()

Then "all" that needs to be done in core is:

decide a better behaviour for Nil endpoints (such as "Nil is not permitted as a Range endpoint") or maybe a Nil endpoint resets the Range to Range()
adjusts the Range op Scalar and Range op Range arithmetic ops to handle the null value
coerce to a Set can make ∅ Set()
coerce to a List can make Empty
extraction of a value can make Nil (eg iteration attempt)

Answer 3 · 2023-12-11T10:42:46.000Z

https://docs.raku.org/language/operators#Identity would also need to be updated

Answer 4 · 2023-12-11T10:52:18.000Z

I'm not following your reasoning. To me 0..0 is equivalent to a List containing a single 0. As such I expect it to be True as a non-empty List coerces to True.

Answer 5 · 2023-12-11T12:03:06.000Z

We can express an undefined Range already. It is (Range).

I do agree that ranges with undefined values (Nil, Type-Objects) should result in an undefined Range, so that any :D-type smily would triggers as expected. My Sprachgefühl (sorry, no english word available) tells me, Nil..Nil should turn into (Range) because Nil indicates that absense of a value but definite Range-objects are values. Thus, here is something deeper amiss. The construction of such an impossible Range object could be a compile time error, because the value is impossible. For the case that the object has to be contructed a runtime, things get even more problematic. Nil.Int warns and turns the absense of a value into 0. If we want consistency, we should preserve Nil and warn. So, Nil..Nil should iterate to Nil, have .so & .defined ~~ False and .elems == 0.

Ranges with type objects like Int..10, Any..Any should at least return (Range) and thus be undefined and not definite. I'm undecided if they should warn but leaning to, yes, they should.

I can't thinks of a case where the construction of an impossible Range would not be a bug. So fail with a distinct X::Range:Impossible would also be an option. That way we can produce a proper error message that contains both endpoints given to the constructor of Range to help debugging.

The whole thing is tricky to decide because Raku is ment to soldier on, even if odd values pop up in any container that is not guarded by :D.

Answer 6 · 2023-12-11T14:59:56.000Z

What about:

Ranges with Nil as an endpoint become failures. (Rationale: Nil denotes absence of a value or a benign failure. Using one as a range endpoint promotes to a non-benign failure.)
There's an "EmptyRange" symbol that denotes a range with no elements (as against a range with one element, 0).
The syntax I suggest for an "EmptyRange" is EmptyRange. (Rationale: If we're to have a built in Unicode symbol for an empty range then 𝟘 as the positional complement to the associative / set related ∅ seems OK, but is on the other hand disappointing in that it undermines the value to users of using the full range of 𝟘, 𝟙 etc for their own purposes. Imo the alphabetic equivalent would best not be Range() because that most naturally remains what it is today, a coercion type object equivalent to Range(Any). There's currently an error if Range(Empty) is written as a function call, but it's accepted as constraint on a parameter in a signature, and doesn't feel right for other reasons. I don't think it's right to let Range.new be given no arguments; that would mean we miss the opportunity to catch that happening by mistake.)
The semantics I suggest for "EmptyRange" is that: it's a Range instance, so it's :D; it's also .defined; it boolifies as False. Other semantics as @librasteve suggests. (Rationale. None, other than it makes sense to me. :))

Answer 7 · 2023-12-15T03:45:14.000Z

I find that the best way to characterize an interval (a Range) is that it is a set of values such that the set is possibly of infinite cardinality, eg when you have 2 real numbers or 2 character strings as endpoints, in contrast with a typical set defined in terms of an enumeration of individual values and therefore is of finite cardinality. This is particularly true in the case of a set of intervals which can be non-continuous. And so, when considering what is reasonable behavior, we should try and define semantics for a Range which are logically consistent with semantics for a set.

Answer 8 · 2023-12-15T03:53:39.000Z

Continuing on, for the generic case of a Range of Any...

A Range with 2 closed endpoints that are the same value defines a set of exactly 1 value which is that common endpoint.
A Range with 2 endpoints that are the same which is open at either end defines an empty set and/or is invalid.
Nil is a single value in generic context like anything else, so Nil..Nil defines a set of exactly 1 value which is the value Nil, same as if you defined an Array or such with Nil as its sole element.
If Raku defines the generic order-determination operator between different types, for example you can sort a list containing Foo and Bar objects and know deterministically that one will sort first without errors, it is likewise valid to define a Range with a Foo endpoint plus a Bar endpoint, since Range semantics are defined over regular ordering.

Answer 9 · 2023-12-15T03:57:04.000Z

I feel that while it may not make sense conceptually to sort different types, eg it doesn't make sense to sort numbers and strings, if Raku at least provides a deterministic default generic sort that can compare two Any and give a defined result, then a Range can always function with any 2 end points no matter what they are. Same as generic equality should always be defined between any 2 things, returning a defined result, although arguably that is easier to argue for and implement than generic sorting. And if it doesn't make conceptual sense, the solution is to just not combine both types in the same Range, but if someone does it, it should "work".

Answer 10 · 2023-12-15T09:38:05.000Z

Some very good & thoughful comments - let me try and summarize my reaction (corrections to the are welcome if anything is misconstrued):

As a consumer of raku, I think the following makes most sense:

On "zero Range"

(0..0).so #True <=== this

this is what we have today in 6.e.PREVIEW - phew!

rationale

this reflects a Set with one element

(0..0).Set        #Set(0)
(0..0).Set.so   #True

this reflects a non empty List with element 0

(0..0).List        #(0)
(0..0).List.so   #True

On "badly drawn Ranges"

(Range) is good (thanks @gfldex ! - I couldn't see the wood for the trees)

(Nil...Any) or (Any...Nil) or (Nil..Nil) returns Nil
ie. Soldier on by propagation of Nil outwards (compile time or runtime)

(Any:U...Any:D) or (Any:D...Any:U) or (Any:U..Any:U) returns (Range)
ie. Propagate undefinedness outwards

Implementors (see below) can use (Range) or its derivatives as the return value when no defined result exists.

I am not sure about warn.

Specifically on my Math::Interval module, this has the following effect:

> my $i1= (0..0).Interval             #0..0
> my $i2= (1..1).Interval              #1..1
> my $i3 = $i1 ∩ $i2                    #Set()               <== this is now wrong
> my $i3 = $i1 ∩ $i2                    #(Interval)        <== I will change to this
> (Interval) ~~ Range                 #True (since class Interval is Range)

Answer 11 · 2023-12-15T13:23:49.000Z

I feel that while it may not make sense conceptually to sort different types, eg it doesn't make sense to sort numbers and strings

To reassure this: 1..'foo' wouldn't even work currently, as the type of the left side governs the understood type of the Range.

Answer 12 · 2024-04-19T10:56:09.000Z

As I see it, ... triple-dot Sequences are currently broken: #407

Whatever is decided for Ranges, please table it until ... triple-dot Sequences are fixed. Otherwise we won't have a reliable indexing mechanism in Raku.

Thank you.

Answer 13 · 2024-04-19T16:39:03.000Z

@jubilatious1 ... I am relaxed about what the core dev team decides has priority, but I am surprised that comma separated triple-dot Sequences are the go to option for array indices - anyway that issue is the place to discuss more fully, I guess...