Benchmark against other EVMs
lightclient opened this issue · 13 comments
It would be cool to benchmark against other EVM implementations, especially evmone
which AFAIK is currently the fastest EVM interpreter.
This would probably be a good benchmark for arithmetic: ethereum/evmone#320
This will be very useful, thank you lightclient!
On my laptop, I am getting around 210-220ms, didn't expect to be that big. Will need to spin perf to see if I can see something.
It is a little bit faster now, I am getting around ~110-120ms. The memory and stack that I got from sputnik were not optimized, and signed operations could probably be done better.
and now it is more in the range of ~85-95ms
~75-80ms now, on my laptop.
This is where the story becomes interesting. and evmone is really great. I added few more optimization: static gas are precalculated in gas_block and applied when needed and added some other small tweaks but still div is big performance hit.
It seems that there is a big difference if I am running windows or linux. windows is usually faster by ~8-10ms, I am still unsure what part of code is responsible for that. All measurements above are done in windows.
For measurement bellow, they only differ by switching div and sdiv opcodes. here
For Parity u256 div I am getting around ~68-72ms on win and ~77-80ms on linux and graph looks like:
while with zkp_u256 I got a boost and was getting around ~58-61ms on windows and on linux ~67-68ms
zkp u256 that uses __udivti3
to divide 2by1 word here. It is a lot faster even with unneeded Option unwrap, I will remove it and measure again a bit later.
parity u256 uses their custom 2by1 div and it is even slower, from flamegraph it seems all time is spent on this function: here
Parity_u256 should probably just switch to u128 and will probably gain some better performance.
evmone uses an optimized version that seems even faster than embedded __udivti3
so there are even more improvements that can be done. Amazing Pawel gave us info on the speed of it: https://groups.google.com/g/llvm-dev/c/5PqUC4nB_DQ/m/DaCBItw4AAAJ
running on: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
flamegraphs as svg if somebody wants to look in detail:
flamegraphs.zip
I feel like there is a lot of small improvements that can be done to optimize things, but we will see how big of an impact they will have.
switching parity u256 div_mod_word
with zkp_u256 gives me good boost ~64-66ms on linux that is even better than zkp_u256
link
same output was got with just using parity u256 div_mod_word
uncommented code
~56-58ms on windows with improved parity u256.
test is found in bin/revm-test/
and executed with cargo run --release
My test was called only once per execution, and I would execute it multiple times to get range of timing. I changed that and introduced loop, so now the execution test is called 50times. So after a few iterations, i am getting a better time than windows
elapsed: 53.666179ms
0: 65.588152ms
1: 63.255175ms
2: 57.723127ms
3: 56.212264ms
4: 53.734064ms
5: 53.121586ms
6: 53.089055ms
7: 53.133512ms
8: 53.082209ms
9: 53.090587ms
10: 53.045255ms
11: 53.880638ms
12: 53.16134ms
13: 52.969316ms
14: 53.033339ms
15: 53.167286ms
16: 53.091371ms
17: 53.054458ms
18: 53.067683ms
19: 53.243839ms
20: 53.085979ms
21: 53.122794ms
22: 53.06014ms
23: 53.123104ms
24: 53.072308ms
25: 53.119213ms
26: 53.072579ms
27: 53.094516ms
28: 53.139832ms
29: 53.038691ms
30: 53.094649ms
31: 53.293706ms
32: 52.844196ms
33: 51.876471ms
34: 52.991977ms
35: 53.015948ms
36: 53.241124ms
37: 52.784502ms
38: 52.94318ms
39: 52.920714ms
40: 52.792951ms
41: 53.023354ms
42: 53.096627ms
43: 53.086917ms
44: 52.479412ms
45: 52.817731ms
46: 53.05368ms
47: 52.982625ms
48: 53.16019ms
49: 53.135602ms
And I am getting close to evmone:
advanced/total/snailtracer/benchmark 51468 us 51466 us 13 gas_rate=4.47271G/s gas_used=230.193M
baseline/total/snailtracer/benchmark 46800 us 46762 us 15 gas_rate=4.92267G/s gas_used=230.193M
after binding intx
directly I am getting even better results that are comparable with evmone (changes are at intx
branch):
mean: 48.905952ms
median: 48.82769ms
0: 49.88344ms
1: 50.16717ms
2: 47.413608ms
3: 48.678762ms
4: 48.776993ms
5: 48.747ms
6: 48.434196ms
7: 48.795624ms
8: 49.002815ms
9: 48.859757ms
10: 48.972574ms
11: 48.752764ms
12: 48.724995ms
13: 48.790919ms
14: 48.897968ms
15: 48.52337ms
16: 49.149537ms
17: 49.326058ms
18: 48.927653ms
19: 49.293851ms
And flamegraph with that change looks like this (zipped svg file: flamegraph.zip):
I will not merge intx
to main brach, proper way should be to reimplement it into rust. There is two issues regarding that for future improvements: #22 and #23
I feel like this is okay to close, revm got very close to evmone and timings looks good. There is probably some optimization that can be done on Host part, evmone uses MockedHost for testing while revm has only standard host impl and mock Database ( you can see from flamegraph sload takes a lot of time), but i will leave this for later. It was fun ride.
Is there any clear documentation comparing the performance of REVM with other EVMs, especially parity EVM which is also based on Rust?
Is there any clear documentation comparing the performance of REVM with other EVMs, especially parity EVM which is also based on Rust?
If you found one, please forward it to me.
In general, this issue is comparing revm with evmone, and there is this comparison with sputnikvm here: cassc/rust-evm-bench#2
Thanks, more info:
https://github.com/ziyadedher/evm-bench