Look into Hash / PartialEq implemented on Value
Tko1 opened this issue · 5 comments
If I recall, due to Value
containing
enum Value {
...
IFn(Rc<dyn IFn>)
..
}
As well as some similar cases containing trait objects, we can not just use [derive(..PartialEq,Hash)]
, but we do need these as our persistent maps will use Values as their keys.
Since a rough, basic persistent map was added at the last minute, so was a last minute implementation of Hash
and PartialEq
on Value. However, I'm pretty sure only Hash needs to be defined, and PartialEq can be defined in terms of Hash. Whatever the case, this code is an example of something that was something of a last minute hack, and has not been proofread yet, so I'd like to make sure this implemented as it should be.
Some rectifications here: PartialEq
and Hash
are completely independent. However, Eq
, which is required to implement Hash
, is directly derived from PartialEq
. The difference between Eq
and PartialEq
is that a type which implements PartialEq
but no Eq
represents a partial equivalence relation (source), and having Eq
implemented tells the compiler that this partial equivalence relation defined with the PartialEq
implementation is in fact an equivalence relation (source).
In other words, if T
implements PartialEq
but not Eq
, then there could be a
of type T
for which a != a
. This is for example the case with floating-point values, in which NaN != NaN
. On the other hand, having both PartialEq
and Eq
implemented for T
means that for every a
of T
, we have a == a
. This is the case for integers, for example.
Some kind of solution:
I think the best way to implement this is to slightly modify how Ifn
s are constructed. Currently, the IFn
trait is implemented by some structures, such as Fn
, AddFn
, ConcatFn
, DoFn
, and so on. A great way to allow hashing would be to divide the problem as much as possible.
As far as I know, special forms are empty structures, so we would just have to add #[derive(Hash)]
before declaration.
Regular forms (represented by Fn
) are more tricky. A way to hash them would be to assign to each form an unique ID (perhaps wrapped in an usize
), and when asked to hash the form, return the hash of the unique ID instead. This comes with a downside: we are limited in the number of Ifn
we declare. To be exact, we would be limited to "only" usize::max
Fn
s for each program execution.
Now we need to tell the compiler that every kind of structure which implements Ifn
also implements Hash
and Eq
. We can modify the definition of the IFn
trait, so that it is only implementable on types which also implement them. The following definition of Ifn
could work:
pub trait IFn: Debug + DynClone + Hash + Eq {
// ...
}
Note:
Please do not automatically derive Eq
and Hash
for Fn
. This data structure contains a lot of data in it. It would slow dramatically the program to have to check for each field for equality, or to hash them.
Note (2):
It may be noted that I am a Rustacean who like LISPs, but never used them. As such, the implementation described in the paragraph before may be entirely wrong. I am making the assumption that two functions declared at two different places in the code, even if they arguments and bodies are the same, won't result in the same hash. Please correct me if I'm wrong.
Edit: fix typo.
I appreciate this, but you don't have to explain what a partial equivalence is, or the relationship between Eq
and Hash
; I know these things and nothing about my message was meaning to imply otherwise. Me mentioning PartialEq
and Hash
together is not implying that PartialEq
is the trait bound on Hash
, I realize that's Eq
and that PartialEq
is only a transitive dependence.
However, there is reason for me mentioning defining PartialEq
in terms of Hash
; as mentioned, I cannot derive Hash
and PartialEq
and Eq
[derive(...PartialEq,Eq,Hash)
, so I need to implement them manually as
impl PartialEq for Value {
....
}
impl Eq for Value {}
impl Hash for Value {
...
}
The thing is, as we know, if a == b
, then hash(a) == hash(b)
. So what I was wondering is whether or not I should be really just use the hashes of Values to determine equality , thereby really only defining equality once (ie,
impl Hash for Value {
... Still defining this
}
// I'm just pulling this example function straight from the std::Hash docs
fn calculate_hash<T: Hash>(t: &T) -> u64 {
let mut s = DefaultHasher::new();
t.hash(&mut s);
s.finish()
impl PartialEq for Value {
fn eq(&self, other: &Value) -> bool {
// Here we just reuse our hashes
calculate_hash(self) == calculate_hash(other)
}
}
impl Eq for Value {}
Another reason I was specifically thinking this is I believe Clojure proper works this way as well, although I have to check again; that it uses an efficient hashing algorithm on objects, and uses that directly for checking equality.
And indeed, two functions with the same definition are not equal / do not hash the same.
Finally,
Now we need to tell the compiler that every kind of structure which implements Ifn also implements Hash and Eq. We can modify the definition of the IFn trait, so that it is only implementable on types which also implement them. The following definition of Ifn could work:
pub trait IFn: Debug + DynClone + Hash + Eq {
// ...
}
Again, unless I'm missing something, give me more credit ahaha (EDIT: not literally, I'm glad to hear suggestions. But I likewise wanted to do this rather than resorting to what's there now). You can't do this outright because IFn
is used as a trait object, and both Eq
and Hash
will break object safety. Hash has a generic type parameter, and Eq uses Self as a type parameter.
TLDR: my previous comment was very wrong.
I failed to understand what you meant by implementing equality thanks to hash. Your idea is valid.
You're also right about trait object safety. Trait objects are things I try to avoid in my own code. As such, their rules are not always clear to me.
It may also be useful to quote my apologies from the Discord server:
I see that most of my comments where irrelevant and/or impossible to do.
Additionally, the tone I used to explain my point of view was not good. I apologise for that
Thank you for staying constructive despite my not very useful feedback.
I am quite sure that the Hashing algorithm used in the Rust standard library is not designed to be fast, but rather to generate at least hash collisions as possible. So this may decrease performances, which is fine since we are in early phases of development.
I think that it's the only thing I can add to the conversation right now. I definitely lack some Clojure knowledge.
First of all I just want to say that maybe I'm missing something as I have never written Hash
and PartialEq
manually.
Nevertheless if I have understood correctly, could implementing PartialEq
in term of Hash
not introduce problems in case of hash collision ? In this case it would be true that hash(a) == hash(b) but you would not want that a == b, or would you ?
Perhaps Hash
could be used in PartialEq
to speed up the cases where hash(a) != hash(b) but if hash(a) == hash(b) then it would have to check whether a and b are really equals. I cannot tell if it would be worth it or not.
could implementing PartialEq in term of Hash not introduce problems in case of hash collision
I've thought about that but I don't know yet. I suspect so, and its a case where I wanted to also look at the clojure base and see if that sort of thing's a problem, and how its handled, or see if I was mistaken about just how hashes are used there. I did forget about this when I wrote this issue, though, as I forgot I was considering dropping this idea altogether because of this. Regardless, I want to look into it