Bug: average_hash returns inconsistent hash lengths

Question

Bug: average_hash returns inconsistent hash lengths

dwachsmuth opened this issue 3 years ago · 6 comments

I have been attempting to use OpenImageR's hash functions in a package designed to facilitate large-scale image comparisons, and everything has been smooth with one major exception.

When I call average_hash with default settings, 95% of the time I get the expected length-64 hash result (in binary mode), while 5% of the time the hash is length-56 (i.e. one row or column of the underlying 8x8 matrix seems to have been dropped), and very occasionally the hash is length-49 (which I imagine means a row AND a column have been dropped).

By comparing these defective hashes with other images which are perceptually identical but bitwise different (and which get the full 64 bits in their hash), it is clear that the last 8 elements of the hash is disappearing. (I.e. the first 56 bits of the two hashes are the same, while one has 8 additional bits and the other simply ends.)

This does not happen with phash, which, in several hundred thousand different tests has always returned a length-64 hash.

I have no idea what is causing this, but I have a reproducible example involving a few images I have uploaded.

library(OpenImageR)

tmp_64 <- tempfile(fileext = ".jpg")
tmp_49 <- tempfile(fileext = ".jpg")

# These are a pair of files which illustrate the issue
download.file("https://upgo.lab.mcgill.ca/resources/hash_test_64.jpg", destfile = tmp_64)
download.file("https://upgo.lab.mcgill.ca/resources/hash_test_49.jpg", destfile = tmp_49)

img_64 <- OpenImageR::readImage(tmp_64)
img_49 <- OpenImageR::readImage(tmp_49)

grey_64 <- rgb_2gray(img_64)
grey_49 <- rgb_2gray(img_49)

# With default arguments, both of these calls should return a length-64 binary hash
hash_64 <- average_hash(grey_64, MODE = "binary")
hash_49 <- average_hash(grey_49, MODE = "binary")

# But this isn't true; the second image is only length-49
length(hash_64) == length(hash_49)

Answer 1 · 2021-09-07T07:14:30.000Z

hi @dwachsmuth and thanks for reporting this issue.

When you specify a hash_size of 8 then internally the gray image will be resized to (8 x 8), this might be a problem with the nearest resize method (the way it works is described in this SO thread)
The function internally calls the 'floor()' so it rounds down by default and this seems to be a problem if the 'width' and 'height' are quite small combined with the input image size (odd number of rows or columns). I could use 'ceil()' but I'd like that the code is similar to how the algorithm works
Therefore I've modified the function so that the image dimensions match and I've added a warning in case the output image dimensions are not the same with the input 'width' and 'height' parameters specified by the user. I'll be glad if you test it with all your images and report back if there are any issues before I submit the updated version to CRAN (next week highly probable)
As an alternative you could also use the 'bilinear' method which in your case returns the correct output dimensions (see the following code snippet). To test the following code you have to install the latest version from github using

remotes::install_github('mlampros/OpenImageR')

require(OpenImageR)

tmp_64 <- tempfile(fileext = ".jpg")
tmp_49 <- tempfile(fileext = ".jpg")

# These are a pair of files which illustrate the issue
download.file("https://upgo.lab.mcgill.ca/resources/hash_test_64.jpg", destfile = tmp_64)
download.file("https://upgo.lab.mcgill.ca/resources/hash_test_49.jpg", destfile = tmp_49)

img_64 <- OpenImageR::readImage(tmp_64)
img_49 <- OpenImageR::readImage(tmp_49)

grey_64 <- rgb_2gray(img_64)
grey_49 <- rgb_2gray(img_49)

# With default arguments, both of these calls should return a length-64 binary hash
hash_64 <- average_hash(grey_64, MODE = "binary", hash_size = 8, resize = "nearest")
dim(hash_64)

hash_64_bil <- average_hash(grey_64, MODE = "binary", hash_size = 8, resize = "bilinear")
dim(hash_64_bil)

hash_49 <- average_hash(grey_49, MODE = "binary", hash_size = 8, resize = "nearest")    # returns the correct output but gives a warning
# hash_49 <- suppressWarnings(average_hash(grey_49, MODE = "binary", hash_size = 8, resize = "nearest"))
dim(hash_49)

hash_49_bil <- average_hash(grey_49, MODE = "binary", hash_size = 8, resize = "bilinear")
dim(hash_49_bil)


lst_dims = list(dim(hash_64), dim(hash_64_bil), dim(hash_49), dim(hash_49_bil))
all(unlist(lapply(lst_dims, function(x) lst_dims[[1]] == x)))

Answer 2 · 2021-09-09T02:00:27.000Z

Hi @mlampros , many thanks for the speedy reply! Your explanation makes a lot of sense, but I'm unable to install the dev version of the package, because I keep running into a compile error which I haven't been able to circumvent. I'm happy to wait until the CRAN binary is ready, and then I will re-run my code on my test image set and report results to you.

Answer 3 · 2021-09-09T04:26:20.000Z

I'll write a few tests for the modified code and I'll submit to CRAN the new version. I'll do that in the next 2 or 3 days. Once the new version is accepted by CRAN I'll notify you.

Answer 4 · 2021-09-17T19:01:18.000Z

@dwachsmuth I've submitted the updated version a week ago and today it seems that testing was completed in all different OS's so you can download the new version from CRAN.

Answer 5 · 2021-09-29T19:07:18.000Z

This is Robo-lampros because the Human-lampros is lazy. This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. Feel free to re-open a closed issue and the Human-lampros will respond.

Answer 6 · 2021-10-06T20:09:20.000Z

This issue was automatically closed because of being stale. Feel free to re-open a closed issue and the Human-lampros will respond.