JohannesBuchner/imagehash

Hashes produced by colorhash occasionally crash

max-kamps opened this issue · 2 comments

Some images (I would say about 10% of them) cause colorhash to return invalid values that can't be roundtripped.
Feeding their hex representation into hex_to_hash creates an invalid ImageHash object that crashes when stringified again.
Example code that reproduces the issue:

from PIL import Image
from imagehash import colorhash, hex_to_hash

img = Image.frombytes('RGB', (1, 1), b'\xff\xb8\xff')  # Example image that crashes. 1x1 pixels

# The first hash works fine
first_hash = colorhash(img)

print(first_hash.hash.dtype)
# >>> bool

print(str(first_hash))
# >>> 07000000000

# These values are expected.
# Now let's roundtrip the hash (hash -> hex -> hash) and see what happens

second_hash = hex_to_hash(str(first_hash))
# imagehash/__init__.py:181: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
#  hash_array = numpy.array([[bool(int(d)) for d in row] for row in bit_rows])

print(second_hash.hash.dtype)
# >>> object

print(str(second_hash))
# File "imagehash/__init__.py", line 100, in __str__
#    return _binary_array_to_hex(self.hash.flatten())
#  File "/imagehash/__init__.py", line 87, in _binary_array_to_hex
#    return '{:0>{width}x}'.format(int(bit_string, 2), width=width)
# ValueError: invalid literal for int() with base 2: '[True, True, True, False, False, False][False, False, False, False, False, False][False, False, False, False, False, False][False, False, False, False, False, False][False, False, False, False, False

see the "Storing hashes" section of the README, hex_to_hash is not the right function here.

Thank you!
I was very confused by the fact that it seemed to work 90% of the time.
Oh well. Should have read the docs more closely.