Superbird11/ranges

Method for evaluating range overlaps + gaps

ab623 opened this issue · 4 comments

ab623 commented

Thanks for the creation and support for the library. I have looked through the code but not found anything that helps in the following scenarios

  1. Given a RangeDict with multiple ranges, how can I identify if the RangeDict has any overlaps in the ranges. Currently the implementation gives priority to the last added Range.

E.g.

r = RangeDict({
    Range(1, 6): "1",
    Range(6, 20): "2",
    Range(10, 30): "3",
})
print(r.hasOverlappingRanges) #  Output: True
  1. Is there a potential to identify where there are gaps in ranges and return a list of Ranges that fill those gaps?
r = RangeDict({
    Range(1, 5): "1",
    Range(10, 20): "2",
})
print(r.getRangeGaps(1, 40))  # Output: [Range(6, 10), Range(21, 40)]

I think both of these might make good additions to the RangeDict functionality.

As for point 1, that's an impossible scenario - like with the vanilla python dict, the same key cannot be present twice. It' not so much that the last added range has priority with lookups - the last-added range overwrites any other ranges that would be overlapping. Otherwise, if you were to request the value at 15 (in your example), you'd get an ambiguous return value. Of course you're welcome to explicitly add ("2", "3") as the returned value for Range(10, 20), but that would be your choice. I should probably also add a .setdefault() method to RangeDict to accommodate similar use cases (e.g. add the given value for only the parts of the key that don't overlap the current dict)

Point 2 is definitely worth doing, but I don't see it as particularly useful for that functionality to be bound to RangeDict. Rather, I'd implement RangeSet.complement() to return a RangeSet of all values not contained in the set that it's called on, from which you could then remove the first/or last entries (both infinite ranges), for the same effect. I'll add that as a feature.

ab623 commented

Thanks for the quick response. Coming back onto your points.

On point 2 first - I agree with you on that the functionality could be better placed on a a RangeSet. (My usage with the library has mainly been using the RangeDict hence my view was slightly narrowed). I do like your approach on the RangeSet.complement() feature which seems quite elegant.

On point 1 - I see what you mean, and again my vision was narrowed concentrating on RangeDict's. I have rethought my implementation and provided some sample code below. Simply I am trying to determine if a list of Range's have any overlap with each other. My rational for this I'm taking user input to construct a RangeDict, and I want to validate to the user if a range start and end they have entered is valid (does not overlap with anything else they have entered, otherwise the results might be confusing to them).

What are you thoughts for adding functionality like the below to the RangeSet/RangeDict objects, as those are simply an iterable of Ranges.

import itertools

ranges_without_overlaps = [
    Range(1, 10),
    Range(10, 20),
    Range(20, 30),
]

ranges_with_overlaps = [
    Range(1, 10),
    Range(8, 20),
    Range(20, 30),
]

def has_overlapping_ranges(range_list):
    """
    Calculate the intersection between each range combination and check if
    any combination results in an intersection.
    
    For large range lists (>10?), this could utilise the itertools generator 
    more effectively rather than converting to a list like in this implementation
    """
    intersections_map = [
        r[0] & r[1] is not None 
        for r in itertools.combinations(range_list, 2)
    ]
    return any(intersections_map)

print(has_overlapping_ranges(ranges_without_overlaps))  # False
print(has_overlapping_ranges(ranges_with_overlaps))     # True

Added Range.complement() and RangeSet.complement().

Did not yet add RangeDict.adddefault() - turns out there are some issues with how contradictory infinite ranges are handled, and trying to bugfix them broke everything. Until I add that method I'm not closing this issue.

That said, has_overlapping_ranges() isn't something that I think makes sense for this library - especially not when, as you demonstrate, it can be simulated by a one liner. I'm reluctant to include it as a standalone helper method, and I can't see any good way to attach it to one of the three data structures Range, RangeSet, and RangeDict that this module already provides.

ab623 commented

I think you make a valid point. Reflecting on the overlapping feature, it only makes sense when there is an iteratable of potentially overlapping ranges, and the only time that would be the case would be a RangeDicts keys I think.

I think its fine if it is left out of the library. The only real place left to place it would be the _helpers.py file. Either way if not added, the code in this issue will remain so that hopefully someone in the same scenario may stumble on it in the future!