Hashes produced by colorhash occasionally crash
max-kamps opened this issue · 2 comments
max-kamps commented
Some images (I would say about 10% of them) cause colorhash
to return invalid values that can't be roundtripped.
Feeding their hex representation into hex_to_hash
creates an invalid ImageHash
object that crashes when stringified again.
Example code that reproduces the issue:
from PIL import Image
from imagehash import colorhash, hex_to_hash
img = Image.frombytes('RGB', (1, 1), b'\xff\xb8\xff') # Example image that crashes. 1x1 pixels
# The first hash works fine
first_hash = colorhash(img)
print(first_hash.hash.dtype)
# >>> bool
print(str(first_hash))
# >>> 07000000000
# These values are expected.
# Now let's roundtrip the hash (hash -> hex -> hash) and see what happens
second_hash = hex_to_hash(str(first_hash))
# imagehash/__init__.py:181: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
# hash_array = numpy.array([[bool(int(d)) for d in row] for row in bit_rows])
print(second_hash.hash.dtype)
# >>> object
print(str(second_hash))
# File "imagehash/__init__.py", line 100, in __str__
# return _binary_array_to_hex(self.hash.flatten())
# File "/imagehash/__init__.py", line 87, in _binary_array_to_hex
# return '{:0>{width}x}'.format(int(bit_string, 2), width=width)
# ValueError: invalid literal for int() with base 2: '[True, True, True, False, False, False][False, False, False, False, False, False][False, False, False, False, False, False][False, False, False, False, False, False][False, False, False, False, False
JohannesBuchner commented
see the "Storing hashes" section of the README, hex_to_hash is not the right function here.
max-kamps commented
Thank you!
I was very confused by the fact that it seemed to work 90% of the time.
Oh well. Should have read the docs more closely.