Much slower than sha1sum
glandium opened this issue · 7 comments
Measuring the speed difference between the sha1 crate and the system sha1sum executable is striking. While the latter performs at more than 500MB/s on my machine, the sha1 crate does less than half that.
With all the claims that rust can compile fast code, I think it's worth investigating why it doesn't do better.
Here is a small testcase I came up with:
extern crate sha1;
extern crate ring;
use std::io::{Read, Write};
use std::time::{Instant, Duration};
use std::process::{Command, Stdio};
fn time<F, FMT>(desc: &str, f: F, fmt: FMT) where F: Fn(), FMT: Fn(Duration) -> String {
let start = Instant::now();
f();
let duration = Instant::now() - start;
println!("{}: {}", desc, fmt(duration));
}
fn main() {
let mut out = Vec::<u8>::new();
std::io::stdin().read_to_end(&mut out).unwrap();
let throughput = |duration: Duration| {
let s = duration.as_secs() as f64;
let ns = duration.subsec_nanos() as f64 / 1000000000.0;
format!("{:.2} MB/s", out.len() as f64 / (s + ns) / 1000000.0)
};
time("sha1sum program", || {
let mut child = Command::new("sha1sum")
.stdin(Stdio::piped())
.spawn().unwrap();
if let Some(ref mut stdin) = child.stdin {
stdin.write(&out).unwrap();
}
child.wait().unwrap();
}, &throughput);
time("sha1 crate", || {
let mut sha1 = sha1::Sha1::new();
sha1.update(&out);
println!("{}", sha1.digest());
}, &throughput);
time("ring crate", || {
let digest = ring::digest::digest(&ring::digest::SHA1, &out);
println!("{:?}", digest);
}, &throughput);
}
Create a binary crate with sha1 and ring dependencies (I added ring for a comparison), and run:
$ cargo run --release < some_big_file
f2138526fd32840e8e097f4e56ef923e4414c504 -
sha1sum program: 515.96 MB/s
f2138526fd32840e8e097f4e56ef923e4414c504
sha1 crate: 228.63 MB/s
SHA-1:f2138526fd32840e8e097f4e56ef923e4414c504
ring crate: 276.03 MB/s
(Note that ring is better, but not that much better)
And I'm not even sure sha1sum is close to the most performant SHA1 implementations out there.
(Edit: fixed units)
(FWIW, it falls to 6.28MB/s without --release)
At the very least the loop should be unrolled. I don't think you can force that via attributes, might have to do something like this: https://www.nayuki.io/res/fast-sha1-hash-implementation-in-x86-assembly/sha1-naive.c
I would happily accept patches here.
By far the fastest implementation of SHA-1 I have seen in rust is in DaGenix/rust-crypto. It closely models the llvm intrinsics and uses a set of simd sized types but is pure Rust code. This performs quite well even using the MSVC compiler and provides performance more than twice as fast as this crate and on my machine and quite comparable to sha1sum on my machine.
I'm not sure if it would be possible but re-implementing their sha-1 algorithm, both crates appear to use similar licensing and almost exactly the same interfaces so the same optimizations should be applicable. As rust-crypto is seemingly abandoned I would love to see their code make it into a maintained library as it is quite efficient and has stood the test of time.
I'm happy to accept patches. Main requirement is that it stays compatible to the interface and #[no_std]
.
Should be closed now with 6957f7e in the codebase
This can be closed now. Current numbers on my ancient mac:
3e694a5b7afa45e6da3a598ec26e7dffdb81bf72 -
sha1sum program: 393.86 MB/s
3e694a5b7afa45e6da3a598ec26e7dffdb81bf72
sha1 crate: 453.18 MB/s
SHA1:3e694a5b7afa45e6da3a598ec26e7dffdb81bf72
ring crate: 180.17 MB/s