JohannesBuchner/imagehash

Hash size doesn't match hash_size parameter for Daubechies wavelets hashing

jonemo opened this issue · 4 comments

I am surprised that the size of the hash computed is not equal to the hash_size parameter available for all hashing methods. Specifically, imagehash.whash(img, hash_size=16, mode="db4") yields a hash of size 22 x 22.

While the readme does not make any explicit promises about the hash size, the naming of parameters makes this outcome quite unexpected. Of course, me being surprised is not an issue in itself and unless this is a bug, it would be unreasonable to break backward compatibility with a change in API or behavior. However, maybe it's worth adding clarification that hash_size does not always match hash size in the documentation/readme?

The readme currently covers hash_size in this paragraph:

Each algorithm can also have its hash size adjusted (or in the case of colorhash, its binbits). Increasing the hash size allows an algorithm to store more detail in its hash, increasing its sensitivity to changes in detail.

Sample code:

    img = Image.open(path)
    hash = imagehash.average_hash(img, hash_size=16)
    print(f"average_hash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.dhash(img, hash_size=16)
    print(f"dhash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.phash(img, hash_size=16)
    print(f"phash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.whash(img, hash_size=16, mode="haar")
    print(f"whash haar: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.whash(img, hash_size=16, mode="db4")
    print(f"whash db4: {len(hash.hash)} x {len(hash.hash[0])}")

Output:

average_hash: 16 x 16
dhash: 16 x 16
phash: 16 x 16
whash haar: 16 x 16
whash db4: 22 x 22

Example image:

tl-20210924-185242

Huh. Do you know why db4 does that?

Sorry, I am the wrong person to ask this question. I used imagehash precisely because I have no clue about any of these algorithms. (And that was a year ago, now I know even less.)

In any case, given how differently the various methods work, no, hash_size does not necessarily have to have a consistent meaning across all methods.