marrow/uri

Implement Python __hash__ method to permit use as dictionary keys and set members.

Wouter1 opened this issue · 7 comments

>>> hash(URI('https://localhost'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'URI'

Could this be easily fixed? It would be nice to be able to use uri's as keys in a dict for instance, or to use efficiently in eg sets.

I agree, this would be beneficial. Can't just have them hash the same as their string representation, though, so some consideration will have to be made. 🤔 (Can't just "pass the buck" down to hash(str(self)).)

After investigation, this is not possible with the URI class, which is mutable. Explanation here. Instead, if one must use these as keys or in sets, you must cast to str or otherwise explicitly serialize for such use, as confusion may rapidly arise between URI that are equivalent, but hash differently—you must agree on a serialization and normalization, including the relevance of such factors as query string parameters (fragment reference?) in that equivalence.

The URI type as built is intended to represent normalized whole URI (not ones with relative internal markers, and not fragments), and additionally qualifies under RFC-3986 as a backward compatible parser, not a strict one, ref. As it encapsulates a mutable mapping to represent URL-encoded query string parameters, and the incorporated path may be altered in-place, URI is not hashable.

mutable? O wow, I did not realize that. That is a deal breaker for the project that uses it as the URI is part of an immutable data structure.

Thanks for all your help so far !

I'm worried that there are larger issues at play in your usage of URI without seeming regard to normalization and proper equality comparison. 😕 I'll endeavor to update the documentation to make it more explicit and clear that by virtue of the features provided a URI instance is inherently mutable.

Specific methods, such as resolve, __div__ (/) and __floordiv__ (//) return new instances, leaving the original unmodified. Even unusual behaviour such as slicing notation (uri[user:pass]) returns a new instance. The problems are the nested mutable collections, inheriting a problem, and that any given attribute may be replaced at any point in time, a far more direct problem.

Thanks for the elaboration.

I'm not sure what you are worried about exactly. I quickly scanned the manual and did not spot any issues. I did not expect mutability so I didn't specifically search for it.

I appreciate that things like equality are properly implemented. But that should not result in mutable objects. Because you mention this, I assume you want to cache some intermediate results to support this functionality. But those are caches and should not be considered a mutation of the object; nor should these caches be part of a hash code.

The problems are the nested mutable collections

That is not necessarily a problem, unless you expose these to the outside.

[…all of that…]

Noted.

…I assume…

Noted previously.

FWIW, not sure if it is any use for you but I slapped together an URI class to fit my needs. I checked several other existing uri packages but none seemed to provide what I want (yours was the only one I really liked).

It's here

https://tracinsy.ewi.tudelft.nl/pubtrac/Utilities/wiki/uri

and the simple class that I made

https://tracinsy.ewi.tudelft.nl/pubtrac/Utilities/browser/uri/uri.py

Basically I cloned an existing URI parser/validator/normalizer tool and added an immutable uri.URI class to it to fit my needs. It does equality testing based on normalized URIs but otherwise it's very limited at this point. At some point I might add some more functionality as needed.