Introduce data-parallelism with Rayon
gbenattar opened this issue ยท 15 comments
The purpose of this task is to introduce data-parallelism with Rayon at first in the zero knowedge proof (refactored with the use of iterator).
Comparison benches will be posted.
I made a quick attempt but ran into issues that looking like they had something to do with passing around references to Mpz
objects, even if rust-gmp
seems to implement the needed traits.
I ran into the same issue, here are the addition:
- cargo.toml
rust-gmp = { version="0.4", optional=true }
# [...]
rayon = "1.0.1"
- src/lib.rs
#[cfg(feature="proofs")]
extern crate rayon;
let x: Vec<_> = y.par_iter()
.map(|yi| BigInt::modpow(yi, &ek.n, &ek.n))
.collect();
cargo build
error[E0599]: no method named `par_iter` found for type `std::vec::Vec<arithimpl::gmpimpl::gmp::mpz::Mpz>` in the current scope
--> src/proof/mod.rs:95:27
|
95 | let x: Vec<_> = y.par_iter()
| ^^^^^^^^
|
= note: the method `par_iter` exists but the following trait bounds were not satisfied:
`std::vec::Vec<arithimpl::gmpimpl::gmp::mpz::Mpz> : rayon::iter::IntoParallelRefIterator`
`[arithimpl::gmpimpl::gmp::mpz::Mpz] : rayon::iter::IntoParallelRefIterator`
error: aborting due to previous error
FYI: https://github.com/mortendahl/rust-paillier/blob/dev/src/arithimpl/gmpimpl.rs.
Updating to rust-gmp 0.5.0 fixes the issue with the introduction of "Sync".
cargo outdated
is a great command btw in case you didn't know it already ;)
Initial results without Rayon:
test self::bench_zk_proof_challenge_1024 ... bench: 284,917,530 ns/iter (+/- 58,277,547)
test self::bench_zk_proof_prove_1024 ... bench: 301,056,360 ns/iter (+/- 32,855,725)
test self::bench_zk_proof_prove_and_verify_1024 ... bench: 298,151,884 ns/iter (+/- 38,319,745)
I suppose that's okay (.3s)..? Did you try to bench without rayon as well?
It is WIP, I will post results here.
ah sorry, I misread. looking forward!
Initial results with Rayon:
test self::bench_zk_proof_challenge_1024 ... bench: 133,593,920 ns/iter (+/- 3,243,637)
test self::bench_zk_proof_prove_1024 ... bench: 159,987,877 ns/iter (+/- 4,681,477)
test self::bench_zk_proof_prove_and_verify_1024 ... bench: 159,776,794 ns/iter (+/- 4,358,748)
nice! ๐ does bencher play nicely with threads or are there some uncertainty due to that in the numbers?
No it is pretty much stable, here is a test containing also 2048 bits key size:
running 5 tests
test self::bench_zk_proof_challenge_1024 ... bench: 163,783,130 ns/iter (+/- 57,689,537)
test self::bench_zk_proof_prove_1024 ... bench: 160,326,807 ns/iter (+/- 13,572,992)
test self::bench_zk_proof_prove_all_1024 ... bench: 291,581,102 ns/iter (+/- 6,282,444)
test self::bench_zk_proof_prove_all_2048 ... bench: 2,078,006,591 ns/iter (+/- 23,361,374)
test self::bench_zk_proof_prove_and_verify_1024 ... bench: 157,755,876 ns/iter (+/- 3,513,484)
test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
@gbenattar out of the 5 tests - only number 4 was for 2048bits?
Can you please add benchmarks for 2048bits without Rayon?
Hi @gbenattar, sorry it wasn't clear; what I was wondering was whether the multi-threading nature of bencher interfered with rayon performance. From a quick skim bencher might run the tests in parallel (using one thread per core), meaning rayon might be penalised. Would you mind trying the benches without concurrency just for fun? :)
So here is a quick test for a single bench:
- When reverting the code to use
iter()
:
test self::bench_zk_proof_challenge_1024 ... bench: 289,669,699 ns/iter (+/- 45,791,433)
- When using
par_iter()
withRUST_TEST_THREADS=1 cargo bench
test self::bench_zk_proof_challenge_1024 ... bench: 142,414,039 ns/iter (+/- 18,410,610)
- When using
par_iter()
withcargo bench
(unset RUST_TEST_THREADS
was done before):
test self::bench_zk_proof_challenge_1024 ... bench: 134,116,663 ns/iter (+/- 22,447,834)
Not sure why the last 2 provide similar results - do you have any idea?
Side note: I am running this on the following machine:
sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz