Tests and benchmarks a handful of ways to split a string in Rust.
I wanted a Rust function that trims a string up to and including the first instance of a specified character.
So fn remove_through_first_char("word1 word2 word3", ' ')
would return "word2 word3"
. If the specified character isn't in the given string, return the string as inputted.
My first working version of the function was this.
fn remove_through_first_char(l: &str, ch: char) -> String {
if l.contains(ch) {
let mut word_vec = l.split(ch).collect::<Vec<&str>>();
word_vec.remove(0);
return word_vec.join(&ch.to_string());
} else {
l.to_string()
}
}
Basically we check if the character ch
is present with a contains
. If it is, we split on the given character ch
into a vector (word_vec
), remove
the first element, then finally join
the vector into a fresh new String.
While this version works in that it passes the tests in src/lib.rs
, it is inefficient, mostly because it performs unnecessary allocations and copies strings. We can do better!
Sergey Bugaev contributed not 1 but ultimately 4 other implementations of the function that all perform this same task more efficiently. They're all located in src/lib.rs
.
Using the Rust benchmarking library Criterion, we benchmarked each of the functions with varying input data sizes. You can see this code in benches/splitters.rs
.
You should be able to run the benchmarks yourself by running cargo bench
, but spoilers below.
(After running cargo bench
, graphs like this one are printed to /target/criterion/Splitters/report/index.html
.)
The clear winner, "V4", is this version, which uses the memchr library:
pub fn remove_through_first_char_variant_4(s: &str, ch: char) -> &str {
match memchr(ch as u8, s.as_bytes()) {
None => s, // not found => return the whole string
Some(pos) => &s[pos + 1..],
}
}
I'm now using this V4 implementation in a tool called Tidy.
Do you have another way to perform the task described above that you think might be more efficient? Improvements to the existing functions? Feel free to create a pull request!