Superbird11/ranges

Lengths of open intervals are reported as if closed

Closed this issue · 3 comments

lmmx commented

I'm working with this library so first of all thank you, looking forward to peeking in the code a bit more (spotted a linked list is involved, neat!) but as requested in the README I just wanted to raise a bug, since I've noticed the lengths of things here don't seem right in a few ways.

I think some parts of this library should expose a length dunder attribute so Python's len can report it, but additionally I think what is being reported is wrong for ranges themselves.

r1 = Range("[0,1)")
r2 = Range("[0,1]")
print(r1.length())
print(r2.length())

1
1

I'm guessing you haven't bothered to cover it because maybe it gets unpleasant for continuous ranges, but thought it was worth asking. In the example here, the empty range has length 1 which just seems wrong to me, I don't know what you think.

For what it's worth, my helper functions to give me the behaviour I want are below, feel free to incorporate any/all/none:

def range_termini(r: Range) -> tuple[int, int]:
    """
    Get the inclusive start and end positions `[start,end]` from a `ranges.Range`.
    These are referred to as the 'termini'. Ranges are always ascending.
    """
    if r.isempty():
        raise ValueError("Empty range has no termini")
    # If range is not empty then can compare regardless of if interval is closed/open
    start = r.start if r.include_start else r.start + 1
    end = r.end if r.include_end else r.end - 1
    return start, end

def range_len(rng: Range) -> int:
    rmin, rmax = range_termini(rng)
    return rmax - rmin

def range_min(r: Range) -> int:
    if r.isempty():
        raise ValueError("Empty range has no minimum")
    return range_termini(r)[0]

def range_max(r: Range) -> int:
    if r.isempty():
        raise ValueError("Empty range has no maximum")
    return range_termini(r)[1]

def validate_range(byte_range: Range | tuple[int, int], allow_empty: bool = True) -> Range:
    "Validate byte_range and convert to `[a,b)` Range if given as integer tuple"
    complain_about_types = (
        f"{byte_range=} must be a `Range` from the `python-ranges`"
        " package or an integer 2-tuple"
    )
    if isinstance(byte_range, tuple):
        if not all(map(lambda x: isinstance(x, int), byte_range)):
            raise TypeError(complain_about_types)
        if len(byte_range) != 2:
            raise TypeError(complain_about_types)
        byte_range = Range(*byte_range)
    elif not isinstance(byte_range, Range):
        raise TypeError(complain_about_types)
    elif not all(map(lambda o: isinstance(o, int), [byte_range.start, byte_range.end])):
        raise TypeError("Ranges must be discrete (use integers for start and end)")
    if not allow_empty and byte_range.isempty():
        raise TypeError("Range is empty")
    return byte_range

def range_span(ranges: list[Range]) -> Range:
    "Assumes input list of RangeSets are in ascending order"
    min_start, _ = range_termini(ranges[0])
    _, max_end = range_termini(ranges[-1])
    return Range(min_start, max_end+1)

(P.S. I've not done so before but if you'd like some help type annotating the library I'd be happy to take a look into that too)

The sole reason __len__() isn't defined is that python doesn't allow that method to return non-integer values (e.g. for ranges with non-integer endpoints [1, 2.5] or non-number values like datetimes), which seems like such an overwhelmingly common scenario that it'd be better to not implement it at all than to implement it wrongly (and then to also have type-checking for comparable but noncontinuous types like strings). Which is why .length() is there instead.

.length() always returns max - min if applicable, because I think it's the most mathematically correct. If a range has a closed edge, then obviously its size is max - min. If a range has an open edge, then its size is infinitely close to max - min, which for almost all purposes is indistinguishable. Note that the range [0, 1) is not empty - it contains 0, 0.5, 0.999, etc (if you want strictly integer ranges, I'd think you could just use python's built-in range(), this module isn't meant to cater to that use case).

Since python's range() doesn't support the extra set of operations, though, I could see adding another class (maybe IntegerRange?) with this extra functionality. Feel free to submit a pull request

lmmx commented

Well that sounds like a great idea, leave it with me...

Regarding contributing, would you be open to splitting the single file out into modules instead of one single file, or must this stay fixed? I find it harder to work with one single file but if that's the house rule I'll keep it that way!

Whatever you find easiest, if it ends up causing a problem later I can always modify the pull request as necessary