Explore possibilities for `hashmap!` and `hashset!` to be generic over hashers
hcpl opened this issue · 6 comments
Motivation
Hash{Map,Set}::with_capacity
limit us to the default hashing algorithm (SipHash).
Solutions
I have 2 ideas on how to implement this suggestion.
Extend capabilities of hashmap!
and hashset!
I tried to trivially modify hashmap!
like this:
macro_rules! hashmap {
(@single $($x:tt)*) => (());
(@count $($rest:expr),*) => (<[()]>::len(&[$(hashmap!(@single $rest)),*]));
($($key:expr => $value:expr,)+) => { hashmap!($($key => $value),+) };
($($key:expr => $value:expr),*) => {
{
let _cap = hashmap!(@count $($key),*);
- let mut _map = ::std::collections::HashMap::with_capacity(_cap);
+ let mut _map = ::std::collections::HashMap::with_capacity_and_hasher(_cap, Default::default());
$(
let _ = _map.insert($key, $value);
)*
_map
}
};
}
For hashset!
changes are analogous.
Unfortunately, this change messes with type inference:
error[E0282]: unable to infer enough type information about `S`
--> src/main.rs:5:15
|
5 | let map = hashmap! {
| _______________^ starting here...
6 | | "a" => 1,
7 | | "b" => 3,
8 | | "c" => 2,
9 | | };
| |_____^ ...ending here: cannot infer type for `S`
|
= note: type annotations or generic parameter binding required
= note: this error originates in a macro outside of the current crate
Aside from the fact that this is a breaking change (I bet most of the code that uses these macros relies on type inference to remove boilerplate), this change also requires users to always write down the exact type which is unergonomic.
If anyone has a better idea how to approach this problem without creating new macros, please comment!
Add new macros
hashmap_with_hasher!
--- create aHashMap
with a providedBuildHasher
hashmap_with_default_hasher!
--- create aHashMap
with a defaultBuildHasher
hashset_with_hasher!
--- create aHashSet
with a providedBuildHasher
hashset_with_default_hasher!
--- create aHashSet
with a defaultBuildHasher
hash{map,set}_with_default_hasher!
work as hash{map,set}!
but use S::default()
where S: BuildHasher + Default
instead of RandomState::default()
.
Syntax of hash{map,set}_with_hasher!
are subject for bikeshedding:
// 1
hashmap_with_hasher!(FnvHasher::with_key(0), {
"a" => 1,
"b" => 2,
});
hashset_with_hasher!(FnvHasher::with_key(0), {"a", "b"});
// 2
hashmap_with_hasher! {
hasher = FnvHasher::with_key(0),
data = {
"a" => 1,
"b" => 2,
},
};
hashset_with_hasher! {
hasher = FnvHasher::with_key(0),
data = {"a", "b"},
};
// Additional syntaxes are welcome!
There is also a third solution which is an alternative interpretation and combination of the first two solutions. It is inspired by #16.
If #16 is ever accepted, the number of possible combinations between optional features (custom hasher, strict mode etc.) will make maintaining macros very hard. As such hashmap!
and hashset!
can be extended in a backwards-compatible way like so:
hashmap! {
hasher = FnvHasher::with_key(0), // optional
strict = true, // optional
data = { // required
"a" => 1,
"b" => 2,
},
}
@hcpl I agree with that, something composable is the only way. Rust has come a long way since this, maybe it's time to look at proc macros? Can we make a proc macro crate for this that doesn't have a significant compile time (I'm thinking below 5 seconds is ideal for such a small purpose crate).
For function-like proc macros that can be used in expressions there are currently 2 ways to implement that I know of. And yeah, they are both hacks, because "native" function-like proc macros stabilized in Rust 1.30, but can only be used in item positions.
- "Hacked" token-based macros using
proc_macro_hack 0.5
crate (requires Rust >= 1.30):- Pro: clear error messages --- the macro has span support to point at the exact place where error occurred.
- Con (mostly): limits possible userbase of
maplit
to those with Rust 1.30 or later --- a major blow for wide adoption of the crate, especially for those who use Rust from system package managers on a stable distribution, like Debian jessie and even buster at the time of writing (which is testing!). - Con: requires uploading 2 crates to crates.io --- a maintenance burden (major con) and an abstraction leak (minor con).
- "Hacked" string-based macros using
proc_macro_hack 0.4
crate (requires Rust >= 1.15):- Pro (mostly): only requires Rust 1.15 to work --- easy to satisfy thanks to the prolific use of
serde_derive
and other custom#[derive]
macros and many try to keep up with the latest stable or nightly Rust anyway. - Con: error messages span over the whole macro invocation --- tracking down what caused the error is difficult.
- Con: requires uploading 2 crates to crates.io --- same as above.
- Pro (mostly): only requires Rust 1.15 to work --- easy to satisfy thanks to the prolific use of
Unfortunately, even string-based proc macros will bump minimum supported Rust version (MSRV) of maplit
when it already entered the v1.0 stage. Under the safest and most conservative way to manage breaking changes, this will also need bumping the crate version to 2.0 which in turn requires manual updates across the whole ecosystem. Doing so will call for a lot of effort --- there are quite many reverse dependencies out there.
More lenient strategies include incrementing minor version for every MSRV bump and advising to use maplit = "~1.x"
in Cargo.toml to avoid sudden breakages. 1.14 -> 1.15 should have minimal impact on the ecosystem for reasons I described above which makes it a viable solution.
I'm not that concerned about MSRV bumps or even releasing a 2.0 if the features are compelling enough.
I wonder if this can't just be solved by adding optional parameters to existing macros, even if I don't see an elegant way of doing so. If it were elegant I'd like to support hasher and capacity options in arbitrary order (both optional), maybe like this or any other syntax that's found to be better?
hashmap! { 1 => 2 }; // still valid
hashmap! { options: hasher=FnvHasher::with_key(0), capacity=12, 1 => 2 };
hashmap! { options: generic_hasher, capacity=1024, 1 => 2 };
I created a macro like the one you described, @bluss. See gist. It accepts options at the beginning in any order. I can open a PR if you're interested.
hashmap! {
capacity: 100,
S: RandomState,
// or
hasher: RandomState::default(),
1 => 2,
}
S
refers to the HashMap
type parameter that you might use in HashMap::default
. Better name for this, anyone?