dart-lang/sdk

Dart should provide a way of combining hashes

blois opened this issue ยท 36 comments

Basically if one overloads operator == they should also overload hashCode and generate a 'good' hash, but Dart doesn't offer any utilities to do so. This means that many users will generate hashes with poor distributions.

The JS string class internally uses http://en.wikipedia.org/wiki/Jenkins_hash_function, it would be great if there were a public API for this.

lrhn commented

Removed Type-Defect label.
Added Type-Enhancement label.

lrhn commented

How about something like:

class JenkinsHasher {
  int _hash = 0;

  void add(int value) {
    _hash += value;
    _hash += _hash << 10;
    _hash ^= _hash >> 6;
  }
  int get hash {
    int hash = _hash;
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    return hash;
  }
}

Then you can use it as:
  var hash = (new JenkinsHash()..add(foo.hashCode)..add(bar.hashCode)).hash;

?

That looks decent.

We currently have something close to this at:
https://code.google.com/p/dart/source/browse/branches/bleeding_edge/dart/sdk/lib/html/html_common/jenkins_smi_hash.dart

My understanding is that the masking in the above is needed for performance.


cc @rakudrama.

for the curious, the jenkins_smi_hash moved and is now at sdk/lib/math. Still private though.

This comment was originally written by @chalin


Any progress on this? I agree with blois's original concern, and the comment in the source file

class _JenkinsSmiHash {
// TODO(11617): This class should be optimized and standardized elsewhere.

This comment was originally written by davidm...@google.com


FYI it's now part of Quiver:

https://github.com/google/quiver-dart/blob/master/lib/src/core/hash.dart

This comment was originally written by @chalin


Nice. Then maybe this issue should be closed?

We have to have this code in dart: libraries for other uses, requiring that a package version be used just adds additional app bloat. Would prefer to keep this issue open, though the presence of Quiver probably lessens the criticality of it.

+1, we can't (currently) use quiver even in many of our packages (the ones in SVN)

@floitschG Thanks--I failed to notice the duplicate issue when I filed #25553.

Given that this issue is "priority-unassigned" and has been open for 2.5 years, what is the likelihood of it getting addressed soon? As I noted in #25553, an upcoming IntelliJ feature would benefit from it, so it would be nice if we could get to it soon.

It seems like it would be a pretty simple matter of renaming a class and adding some documentation. Would it be helpful if I submitted a CL for review?

It's not the actual work. Adding new classes to core-libraries have a very high bar to pass.
In general, the tendency is to remove things, not to add features. Especially, when the functionality fits nicely into a package (as is the case here).

We will discuss it, though.

Ok, thanks. In case the context is useful to the discussion, the IntelliJ feature @jwren is working on is automatic generation of a hashcode getter for a class (if the user requests it). It would be nice to be able to use JenkinsSmiHash to create the hashcode (since this would give a far higher quality hashcode than a naive approach like XORing together the hashes of the components). But if JenkinsSmiHash is in a package, the feature would be a lot more complex, since IntelliJ would have to check for the presence of the package in pubspec.yaml and add it if it's not already present. Also the feature wouldn't work properly in systems that don't make use of pubspec.yaml.

And if the package containing the function needed to be added, it would have to ensure that adding it wouldn't introduce any version constraint conflicts. And then it would need to run pub in order to add the package to the .packages file. All of which would make this a very expensive operation.

If it's in a package I expect that the feature just won't use it and we'll have lower quality code being generated.

a14n commented

BTW quiver has this kind of functions for a while (see https://github.com/google/quiver-dart/blob/master/lib/src/core/hash.dart - Commits on Nov 14, 2013). Is it worth adding quiver as dependency and using it ?

+1 to what @bwilkerson and @stereotype441 said: adding a dependency is a heavier weight operation than I'd like for the Dart IntelliJ Live Template support. Adding an import isn't sufficient as pub run has the be executed.

lrhn commented

If the code already exists in the platform libraries, just being private, then exposing it might be worth considering.
We don't necessarily want to add things, but exposing something that's already there is less of a concern to me - even if it means adding more public names.

What I can't decide is whether it belongs more in dart:core or dart:collection. :)

That said, I'm definitely more positive towards adding things than average :)

@floitschG Have you had a chance to discuss this?

Not yet. Still on my TODO list, though.

For the time being, IntelliJ now generates with XORs:

class Example {
  var a, b, c, d, e;

  @override
  bool operator ==(Object other) {
    if (identical(this, other)) {
      return true;
    }
    return other is Example &&
        this.a == other.a &&
        this.b == other.b &&
        this.c == other.c &&
        this.d == other.d &&
        this.e == other.e;
  }

  @override
  int get hashCode {
    return a.hashCode ^ b.hashCode ^ c.hashCode ^ d.hashCode ^ e.hashCode;
  }


}
bp74 commented

This may not add much to the discussion, but i have duplicates of the JenkinsHash class in my libraries too (StageXL, ...). I think one of the benefits of Dart over JavaScript is that i can built on top of strong core libraries, adding a feature to calculate good hash codes should be definitely in the core libraries!

lrhn commented

I'd put a hash function in package:collection.

We'd likely still have the functionality in dart:collection and dart:core, but not necessarily with as nice an interface.

Any update on this? Do we think this is something we can expose from either the SDK or package:collection?

lrhn commented

A canonical way to combine hashes is planned for Dart 2.0.

@lrhn - are you open to contributions for this?

If I create a patch with the interface described in #11617 (comment) do we see a path to getting it landed?

Personally I think something like this would be more usable for folks:

class Object {
  static int hash(List<Object> any) {
    // ...
  }
}

... but even just JenkinsHasher is better than copy-pasting code everywhere.

We could also have jenkinsHasher.addAll(Iterable<Object>).

Similarly I'd be in favor of

class Object {
  static int hash(Object a, Object b, [Object c, ...]);
  static int hashAll(List<Object> objects);
}
lrhn commented

I'm already working on it (so let me upload the CL instead of letting it sit in my repository while I figure out why some tools won't allow it), but I agree with the design here.

I'd love to add a proper generalized Hasher interface at some point, but for now we'll just use the same hasher everywhere. It's not like you actually need more than one, if it's good enough.

@lrhn any updates on this?

Looking at the difficulty in writing a hashCode for a 'value'-Map implementation where == means the same keys and values independent of insertion order, I would suggest that corelib also supply a way of generating order-independent hashes (suitable for Sets and Maps)

class Object {
  static int hash(Object a, Object b, [Object c, ...]);
  static int hashAll(List<Object> objects);
  static int hashAllUnordered(Iterable<Object> objects);
}

I would suggest that corelib also supply a way of generating order-independent hashes (suitable for Sets and Maps)

IMO that would be a better fit in package:collection.

@natebosch While hashing of collections is a natural fit for package:collection, I think unordered hashing has other applications (e.g. geometric invariance), and there is a discoverability tax to having the hashing methods in several places.

I did an investigation of what would be required for a general Object.hash to be as efficient as the hand-rolled hashCode methods and the methods did need to be known to the compiler (i.e. treated specially) to get there. The main problem with all of these utility methods is that they call o.hashCode with static type of o being Object, which is the most complex dispatch. There are patterns of specialization that would hit a sweet spot in between slow dispatch and excessive inlining, but the exact pattern of specialization depends on the compiler (dart2js would be very different to AOT).

+1 Stephen's point. In addition, package:collection can't be used from our SDK impls, which means we may have duplicate code in the SDK. Lastly, there are proposed Dart language features that may require the compilers to automatically implement hashing/equality for value types (e.g. dart-lang/language#125).

I am new to Dart and, based on this thread, wondering if giving it a chance is a mistake.

Is something this simple (and necessary) truly still an open issue after almost seven years of discussion?

Please tell me I'm wrong. I am hoping an enhancement has already shipped and that someone simply forgot to close this issue. If so, please post the new technique / method.

If the issue truly is still open, please consider promoting its priority. Thank you.

Given the ease of implementation and testing, this sure seems like low-hanging fruit.

https://dart-review.googlesource.com/c/sdk/+/73360 Is the CL, unfortunately it's gotten held up by some other langauge design considerations.

@dnfield thank you for the pointer. Its a shame something this easy is held up. Though, in this particular case, it benefited me. Working around its absence forced me to recognize a better design in an unrelated part of my project :)

To recap, looks like now there's a Object.hash(...) function, and a few similar. Yay!