onmyway133/DeepDiff

Using HashValue as table key is unreliable

Closed this issue · 12 comments

I've been testing a variation of your Heckel implementation in a macOS CollectionView project and found an issue. Using hashValue as the key in the table is not reliable. I've made this same mistake myself in early versions of the data structures used in that collection view implementation. My basic solution for now is to replace with [T.Element:TableEntry] and the issue hasn't come up again.

From the Apple Docs

A hash value, provided by a type’s hashValue property, is an integer that is the same for any two instances that compare equally. That is, for two instances a and b of the same type, if a == b, then a.hashValue == b.hashValue. The reverse is not true: Two instances with equal hash values are not necessarily equal to each other.

@WCByrne Hi, thanks for the suggestion. I will check if I can make the changes. Besides, do you have a test that fails because of this?

I do not. I was running into this issue while running a demo app (in the Collection View project) which uses Core Data. When I tried to reproduce the issue in a test using mock objects (not NSManagedObject) but making the same changes to the arrays, it succeeded. I finally noticed that the edits produced from the diff were different given the same source and target arrays. I'm not sure what NSManagedObject uses for it's hashValue but it is apparently less unique that you might expect.

As I mentioned, I have made the same assumption and used hashValue for keys in some of my data structures and often ran into this unexpected behavior.

It's more likely that the usage of NSManagedObjects hashes is unreliable because if I remember correctly they're derived from the objectID. If you obtain permanent IDs prior to diffing then it's likely that the algorithm is sound.

Otherwise a weak to weak map table might work. Haven't really looked into the implementation.

@WCByrne Hi, sorry for late response. I've tried using the whole Hashable object as the key, but I don't see if it makes any difference. Dictionary will use hashValue of Hashable object as the key, so it performs the same as when we use just hashValue. I will change if there's any issues/falling tests

👍 hashValue is unique most of the time and will probably work for most applications but is not "correct" as explained by the description of hash value from the docs quoted above.

Hashable is not sufficient as a type constraint on elements in order to compute a diff between two collections. Two different objects with different properties may have the same hash value, as pointed out above.

In order to compute insertions, deletions, and moves, you will need at least an Equatable type constraint on the element type.

And if you want to detect (in-place) updates, you will also need some concept of identity. Maybe an "Identifiable" protocol, that uses some form of id to answer the question "are these two objects the same individual? (possibly with different properties)".

@ktraunmueller Hi, you're right. Maybe this issue may interest you #16

Thanks for the quick response. So, DeepDiff does not generally give correct results, because the requirements imposed on the element type are too weak, right?

@ktraunmueller It is based on Hashable and Equatable, so I think it works quite good. Do you have a test that fails?

Sorry, I forgot that a Hashable is an Equatable... 🤦‍♂️

@ktraunmueller No worried, I learned that the hard way too, you can read my story here onmyway133/blog#122 😅

ahti commented

I've tried using the whole Hashable object as the key, but I don't see if it makes any difference. Dictionary will use hashValue of Hashable object as the key, so it performs the same as when we use just hashValue. I will change if there's any issues/falling tests

This is incorrect. You can verify easily by throwing this code in a playground:

struct Lol: Hashable {
    let str: String

    var hashValue: Int {
        return 0
    }
}

var d: [Lol: String] = [:]

d[Lol(str: "hello")] = "asdf"
d[Lol(str: "bye")] = "fdsa"

assert(d.count == 2)

Equal hash values do not mean equal objects, and DeepDiff should work with diffs containing things with colliding hash values, which from what I read here (haven't looked at the code itself) is not the case right now.