DanielStutzbach/blist

Case insensitive sortedset

philfreo opened this issue · 3 comments

Would be great to see a case insensitive version of sortedset

Doing this now for sorting is easy because you have the key parameter (sortedlist(foo, key=lambda x: x.lower()))

However in those cases you also want to de-dupe in a case insensitive way, which it doesn't do.

If there's an easy way to accomplish this now, I'd love to hear it. The best I've come up with so far is:

class isortedset(sortedset):
    def __contains__(self, key):
        return key.lower() in (n.lower() for n in self)

but this breaks the nice time complexity

One solution is to put the strings into a canonical (e.g., lowercase) form before inserting them into the set.

@DanielStutzbach of course, but that's not always what you want. For example file systems often only allow 1 version (in a case insensitive comparison) but keep track of the original version of the case.

In that case, you could use a dict that maps canonical values to original values.

Or you could use sortedlist(foo, key=str.lower), then something like:

def __contains__(self, key):
    if not self:
        return False
    return self[self.bisect_left(key)] == key.lower()