clojure-rs/ClojureRS

Look into Hash / PartialEq implemented on Value

Tko1 opened this issue · 5 comments

Tko1 commented

If I recall, due to Value containing

enum Value { 
  ...
  IFn(Rc<dyn IFn>)
  ..
}

As well as some similar cases containing trait objects, we can not just use [derive(..PartialEq,Hash)] , but we do need these as our persistent maps will use Values as their keys.

Since a rough, basic persistent map was added at the last minute, so was a last minute implementation of Hash and PartialEq on Value. However, I'm pretty sure only Hash needs to be defined, and PartialEq can be defined in terms of Hash. Whatever the case, this code is an example of something that was something of a last minute hack, and has not been proofread yet, so I'd like to make sure this implemented as it should be.

Some rectifications here: PartialEq and Hash are completely independent. However, Eq, which is required to implement Hash, is directly derived from PartialEq. The difference between Eq and PartialEq is that a type which implements PartialEq but no Eq represents a partial equivalence relation (source), and having Eq implemented tells the compiler that this partial equivalence relation defined with the PartialEq implementation is in fact an equivalence relation (source).

In other words, if T implements PartialEq but not Eq, then there could be a of type T for which a != a. This is for example the case with floating-point values, in which NaN != NaN. On the other hand, having both PartialEq and Eq implemented for T means that for every a of T, we have a == a. This is the case for integers, for example.

Some kind of solution:

I think the best way to implement this is to slightly modify how Ifns are constructed. Currently, the IFn trait is implemented by some structures, such as Fn, AddFn, ConcatFn, DoFn, and so on. A great way to allow hashing would be to divide the problem as much as possible.

As far as I know, special forms are empty structures, so we would just have to add #[derive(Hash)] before declaration.

Regular forms (represented by Fn) are more tricky. A way to hash them would be to assign to each form an unique ID (perhaps wrapped in an usize), and when asked to hash the form, return the hash of the unique ID instead. This comes with a downside: we are limited in the number of Ifn we declare. To be exact, we would be limited to "only" usize::max Fns for each program execution.

Now we need to tell the compiler that every kind of structure which implements Ifn also implements Hash and Eq. We can modify the definition of the IFn trait, so that it is only implementable on types which also implement them. The following definition of Ifn could work:

pub trait IFn: Debug + DynClone + Hash + Eq {
    // ...
}

Note:

Please do not automatically derive Eq and Hash for Fn. This data structure contains a lot of data in it. It would slow dramatically the program to have to check for each field for equality, or to hash them.

Note (2):

It may be noted that I am a Rustacean who like LISPs, but never used them. As such, the implementation described in the paragraph before may be entirely wrong. I am making the assumption that two functions declared at two different places in the code, even if they arguments and bodies are the same, won't result in the same hash. Please correct me if I'm wrong.

Edit: fix typo.

Tko1 commented

I appreciate this, but you don't have to explain what a partial equivalence is, or the relationship between Eq and Hash; I know these things and nothing about my message was meaning to imply otherwise. Me mentioning PartialEq and Hash together is not implying that PartialEq is the trait bound on Hash, I realize that's Eq and that PartialEq is only a transitive dependence.
However, there is reason for me mentioning defining PartialEq in terms of Hash; as mentioned, I cannot derive Hash and PartialEq and Eq

[derive(...PartialEq,Eq,Hash)

, so I need to implement them manually as

impl PartialEq for Value { 
   ....
} 
impl Eq for Value {} 
impl Hash for Value {
   ...
}

The thing is, as we know, if a == b, then hash(a) == hash(b). So what I was wondering is whether or not I should be really just use the hashes of Values to determine equality , thereby really only defining equality once (ie,

impl Hash for Value { 
    ... Still defining this
}
// I'm just pulling this example function straight from the std::Hash docs 
fn calculate_hash<T: Hash>(t: &T) -> u64 {
    let mut s = DefaultHasher::new();
    t.hash(&mut s);
    s.finish()
impl PartialEq for Value { 
        fn eq(&self, other: &Value) -> bool {
             // Here we just reuse our hashes
             calculate_hash(self) == calculate_hash(other) 
        }
}
impl Eq for Value {} 

Another reason I was specifically thinking this is I believe Clojure proper works this way as well, although I have to check again; that it uses an efficient hashing algorithm on objects, and uses that directly for checking equality.

And indeed, two functions with the same definition are not equal / do not hash the same.
Finally,

Now we need to tell the compiler that every kind of structure which implements Ifn also implements Hash and Eq. We can modify the definition of the IFn trait, so that it is only implementable on types which also implement them. The following definition of Ifn could work:

pub trait IFn: Debug + DynClone + Hash + Eq {
    // ...
}

Again, unless I'm missing something, give me more credit ahaha (EDIT: not literally, I'm glad to hear suggestions. But I likewise wanted to do this rather than resorting to what's there now). You can't do this outright because IFn is used as a trait object, and both Eq and Hash will break object safety. Hash has a generic type parameter, and Eq uses Self as a type parameter.

TLDR: my previous comment was very wrong.

I failed to understand what you meant by implementing equality thanks to hash. Your idea is valid.

You're also right about trait object safety. Trait objects are things I try to avoid in my own code. As such, their rules are not always clear to me.

It may also be useful to quote my apologies from the Discord server:

I see that most of my comments where irrelevant and/or impossible to do.
Additionally, the tone I used to explain my point of view was not good. I apologise for that
Thank you for staying constructive despite my not very useful feedback.

I am quite sure that the Hashing algorithm used in the Rust standard library is not designed to be fast, but rather to generate at least hash collisions as possible. So this may decrease performances, which is fine since we are in early phases of development.

I think that it's the only thing I can add to the conversation right now. I definitely lack some Clojure knowledge.

First of all I just want to say that maybe I'm missing something as I have never written Hash and PartialEq manually.

Nevertheless if I have understood correctly, could implementing PartialEq in term of Hash not introduce problems in case of hash collision ? In this case it would be true that hash(a) == hash(b) but you would not want that a == b, or would you ?

Perhaps Hash could be used in PartialEq to speed up the cases where hash(a) != hash(b) but if hash(a) == hash(b) then it would have to check whether a and b are really equals. I cannot tell if it would be worth it or not.

Tko1 commented

could implementing PartialEq in term of Hash not introduce problems in case of hash collision

I've thought about that but I don't know yet. I suspect so, and its a case where I wanted to also look at the clojure base and see if that sort of thing's a problem, and how its handled, or see if I was mistaken about just how hashes are used there. I did forget about this when I wrote this issue, though, as I forgot I was considering dropping this idea altogether because of this. Regardless, I want to look into it