bluss/maplit

Explore possibilities for `hashmap!` and `hashset!` to be generic over hashers

hcpl opened this issue · 6 comments

hcpl commented

Motivation

Hash{Map,Set}::with_capacity limit us to the default hashing algorithm (SipHash).

Solutions

I have 2 ideas on how to implement this suggestion.

Extend capabilities of hashmap! and hashset!

I tried to trivially modify hashmap! like this:

 macro_rules! hashmap {
     (@single $($x:tt)*) => (());
     (@count $($rest:expr),*) => (<[()]>::len(&[$(hashmap!(@single $rest)),*]));

     ($($key:expr => $value:expr,)+) => { hashmap!($($key => $value),+) };
     ($($key:expr => $value:expr),*) => {
         {
             let _cap = hashmap!(@count $($key),*);
-            let mut _map = ::std::collections::HashMap::with_capacity(_cap);
+            let mut _map = ::std::collections::HashMap::with_capacity_and_hasher(_cap, Default::default());
             $(
                 let _ = _map.insert($key, $value);
             )*
             _map
         }
     };
 }

For hashset! changes are analogous.

Unfortunately, this change messes with type inference:

error[E0282]: unable to infer enough type information about `S`
 --> src/main.rs:5:15
  |
5 |       let map = hashmap! {
  |  _______________^ starting here...
6 | |         "a" => 1,
7 | |         "b" => 3,
8 | |         "c" => 2,
9 | |     };
  | |_____^ ...ending here: cannot infer type for `S`
  |
  = note: type annotations or generic parameter binding required
  = note: this error originates in a macro outside of the current crate

Aside from the fact that this is a breaking change (I bet most of the code that uses these macros relies on type inference to remove boilerplate), this change also requires users to always write down the exact type which is unergonomic.

If anyone has a better idea how to approach this problem without creating new macros, please comment!

Add new macros

  • hashmap_with_hasher! --- create a HashMap with a provided BuildHasher
  • hashmap_with_default_hasher! --- create a HashMap with a default BuildHasher
  • hashset_with_hasher! --- create a HashSet with a provided BuildHasher
  • hashset_with_default_hasher! --- create a HashSet with a default BuildHasher

hash{map,set}_with_default_hasher! work as hash{map,set}! but use S::default() where S: BuildHasher + Default instead of RandomState::default().

Syntax of hash{map,set}_with_hasher! are subject for bikeshedding:

// 1
hashmap_with_hasher!(FnvHasher::with_key(0), {
    "a" => 1,
    "b" => 2,
});

hashset_with_hasher!(FnvHasher::with_key(0), {"a", "b"});

// 2
hashmap_with_hasher! {
    hasher = FnvHasher::with_key(0),
    data = {
        "a" => 1,
        "b" => 2,
    },
};

hashset_with_hasher! {
    hasher = FnvHasher::with_key(0),
    data = {"a", "b"},
};

// Additional syntaxes are welcome!
hcpl commented

There is also a third solution which is an alternative interpretation and combination of the first two solutions. It is inspired by #16.

If #16 is ever accepted, the number of possible combinations between optional features (custom hasher, strict mode etc.) will make maintaining macros very hard. As such hashmap! and hashset! can be extended in a backwards-compatible way like so:

hashmap! {
    hasher = FnvHasher::with_key(0),  // optional
    strict = true,                    // optional
    data = {                          // required
        "a" => 1,
        "b" => 2,
    },
}
bluss commented

@hcpl I agree with that, something composable is the only way. Rust has come a long way since this, maybe it's time to look at proc macros? Can we make a proc macro crate for this that doesn't have a significant compile time (I'm thinking below 5 seconds is ideal for such a small purpose crate).

hcpl commented

For function-like proc macros that can be used in expressions there are currently 2 ways to implement that I know of. And yeah, they are both hacks, because "native" function-like proc macros stabilized in Rust 1.30, but can only be used in item positions.

  • "Hacked" token-based macros using proc_macro_hack 0.5 crate (requires Rust >= 1.30):
    • Pro: clear error messages --- the macro has span support to point at the exact place where error occurred.
    • Con (mostly): limits possible userbase of maplit to those with Rust 1.30 or later --- a major blow for wide adoption of the crate, especially for those who use Rust from system package managers on a stable distribution, like Debian jessie and even buster at the time of writing (which is testing!).
    • Con: requires uploading 2 crates to crates.io --- a maintenance burden (major con) and an abstraction leak (minor con).
  • "Hacked" string-based macros using proc_macro_hack 0.4 crate (requires Rust >= 1.15):
    • Pro (mostly): only requires Rust 1.15 to work --- easy to satisfy thanks to the prolific use of serde_derive and other custom #[derive] macros and many try to keep up with the latest stable or nightly Rust anyway.
    • Con: error messages span over the whole macro invocation --- tracking down what caused the error is difficult.
    • Con: requires uploading 2 crates to crates.io --- same as above.

Unfortunately, even string-based proc macros will bump minimum supported Rust version (MSRV) of maplit when it already entered the v1.0 stage. Under the safest and most conservative way to manage breaking changes, this will also need bumping the crate version to 2.0 which in turn requires manual updates across the whole ecosystem. Doing so will call for a lot of effort --- there are quite many reverse dependencies out there.

More lenient strategies include incrementing minor version for every MSRV bump and advising to use maplit = "~1.x" in Cargo.toml to avoid sudden breakages. 1.14 -> 1.15 should have minimal impact on the ecosystem for reasons I described above which makes it a viable solution.

bluss commented

I'm not that concerned about MSRV bumps or even releasing a 2.0 if the features are compelling enough.

I wonder if this can't just be solved by adding optional parameters to existing macros, even if I don't see an elegant way of doing so. If it were elegant I'd like to support hasher and capacity options in arbitrary order (both optional), maybe like this or any other syntax that's found to be better?

hashmap! { 1 => 2 }; // still valid
hashmap! { options: hasher=FnvHasher::with_key(0), capacity=12, 1 => 2 };
hashmap! { options: generic_hasher, capacity=1024, 1 => 2 };

I created a macro like the one you described, @bluss. See gist. It accepts options at the beginning in any order. I can open a PR if you're interested.

hashmap! {
    capacity: 100,
    S: RandomState,
    // or
    hasher: RandomState::default(),
    1 => 2,
}

S refers to the HashMap type parameter that you might use in HashMap::default. Better name for this, anyone?

@bluss did you find time to take a look? What do you think? I'm currently forking maplit because of this issue, it would be great if I didn't have to. Thank you for your time and work on this crate!