An investigation of the performance regression we noticed when upgrading from OpenSSL v1.1.1w to v3.0.2, mainly via curl.
- This repo is optimized for step debugging different versions of OpenSSL simultaneously, hence the multiple submodules at different versions.
- Build deps required for curl and OpenSSL are not included here.
- Our use case is single-threaded, so we're only measuring and testing such.
- A test cert is included for convenience.
$ # Compile OpenSSL, curl (HEAD), curl-test at v1.1.1w, v3.0.2, v3.1.3
$ ./compile.sh
...
$
$ # Ensure test programs are linked properly
$ ldd curl-test-openssl* | grep -e curl-test-openssl -e libcurl -e libcrypto
curl-test-opensslv111w:
libcurl.so.4 => /home/adam/curl-openssl-perf/install-curl-opensslv111w/lib/libcurl.so.4 (0x00007f353aa20000)
libcrypto.so.1.1 => /home/adam/curl-openssl-perf/install-opensslv111w/lib/libcrypto.so.1.1 (0x00007f353a400000)
curl-test-opensslv302:
libcurl.so.4 => /home/adam/curl-openssl-perf/install-curl-opensslv302/lib/libcurl.so.4 (0x00007f1a5bb4b000)
libcrypto.so.3 => /home/adam/curl-openssl-perf/install-opensslv302/lib/libcrypto.so.3 (0x00007f1a5b200000)
curl-test-opensslv313:
libcurl.so.4 => /home/adam/curl-openssl-perf/install-curl-opensslv313/lib/libcurl.so.4 (0x00007f1ae3d1e000)
libcrypto.so.3 => /home/adam/curl-openssl-perf/install-opensslv313/lib/libcrypto.so.3 (0x00007f1ae3400000)
$
$ # Start simple httpd
$ ./httpd.sh &
[1] 2791487
$
$ # Run test programs
$ ./curl-test-opensslv111w 30 | tee >(cut -d' ' -f2 | descstat pretransfer) >(cut -d' ' -f1 | descstat ______total) | cat
0.0504270 0.0084580
0.0556320 0.0116330
0.0637960 0.0194580
0.0637410 0.0196290
0.0637390 0.0195760
0.0556930 0.0116120
0.0597930 0.0191160
0.0597400 0.0189390
0.0637730 0.0208000
0.0597700 0.0191660
0.0597360 0.0189990
0.0637570 0.0191210
0.0637570 0.0192750
0.0676990 0.0193090
0.0637260 0.0195690
0.0637660 0.0198740
0.0636830 0.0192150
0.0557130 0.0114920
0.0597700 0.0191520
0.0637270 0.0191080
0.0637080 0.0195310
0.0597760 0.0190440
0.0637110 0.0193940
0.0637850 0.0198100
0.0636630 0.0195210
0.0597110 0.0190620
0.0637250 0.0192510
0.0598120 0.0190200
0.0637170 0.0196070
0.0557290 0.0113820
______total: n=30 min=0.050 max=0.068 mean=0.061 median=0.064 p95=0.064 p99=0.064 stdev=0.004 sum=1.839
pretransfer: n=30 min=0.008 max=0.021 mean=0.018 median=0.019 p95=0.020 p99=0.020 stdev=0.003 sum=0.539
$ ./curl-test-opensslv302 30 | tee >(cut -d' ' -f2 | descstat pretransfer) >(cut -d' ' -f1 | descstat ______total) | cat
0.0546710 0.0138870
0.0596690 0.0155460
0.0637430 0.0225710
0.0717300 0.0309830
0.0717270 0.0297990
0.0717350 0.0309960
0.0715420 0.0309340
0.0758620 0.0319520
0.0717380 0.0297390
0.0757110 0.0314220
0.0756970 0.0312450
0.0717350 0.0309260
0.0717350 0.0306430
0.0756290 0.0311530
0.0717310 0.0310230
0.0716760 0.0307870
0.0717420 0.0306390
0.0716650 0.0308620
0.0716730 0.0305630
0.0718310 0.0304160
0.0716660 0.0303250
0.0717560 0.0304360
0.0718260 0.0293320
0.0716320 0.0305980
0.0717160 0.0303230
0.0757150 0.0311390
0.0717130 0.0305160
0.0756640 0.0317180
0.0756580 0.0312780
0.0757600 0.0316260
______total: n=30 min=0.055 max=0.076 mean=0.072 median=0.072 p95=0.076 p99=0.076 stdev=0.005 sum=2.146
pretransfer: n=30 min=0.014 max=0.032 mean=0.029 median=0.031 p95=0.032 p99=0.032 stdev=0.004 sum=0.883
$ ./curl-test-opensslv313 30 | tee >(cut -d' ' -f2 | descstat pretransfer) >(cut -d' ' -f1 | descstat ______total) | cat
0.0580230 0.0170500
0.0556350 0.0119370
0.0636670 0.0201160
0.0636500 0.0205430
0.0598250 0.0117070
0.0636560 0.0200220
0.0637360 0.0200840
0.0636840 0.0198300
0.0637490 0.0195250
0.0636550 0.0195450
0.0637520 0.0204030
0.0516030 0.0111920
0.0637700 0.0199610
0.0677220 0.0195340
0.0636750 0.0199290
0.0636780 0.0199180
0.0636770 0.0198130
0.0596810 0.0182130
0.0637540 0.0196090
0.0637290 0.0196550
0.0636790 0.0200690
0.0637010 0.0201600
0.0637170 0.0200610
0.0637050 0.0199020
0.0637230 0.0197060
0.0637170 0.0198060
0.0636800 0.0196640
0.0636570 0.0200040
0.0637850 0.0197620
0.0676740 0.0205340
______total: n=30 min=0.052 max=0.068 mean=0.063 median=0.064 p95=0.064 p99=0.068 stdev=0.003 sum=1.885
pretransfer: n=30 min=0.011 max=0.021 mean=0.019 median=0.020 p95=0.020 p99=0.021 stdev=0.003 sum=0.568
Median pretransfer time (CURLINFO_PRETRANSFER_TIME
) on v3.0.2 is ~10ms slower
than v1.1.1w -- (31 vs 19ms). v3.1.3 appears to have fixed this issue, being
just a smidge slower overall.
Taking a perf record
while running ./curl-test-opensslv302 9999
shows the
big slow down is in libcrypto's sa_doall
which is used to map various
functions against large arrays:
Samples: 38K of event 'cycles', Event count (approx.): 10745290398
Children Self Command Shared Object Symbol
...
+ 9.77% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter ◆
+ 9.77% 0.18% swapper [kernel.kallsyms] [k] cpuidle_enter_state ▒
+ 9.66% 0.01% curl-test-opens libcrypto.so.3 [.] construct_evp_method ▒
+ 9.32% 0.00% curl-test-opens libssl.so.3 [.] ossl_statem_connect ▒
+ 9.31% 0.00% curl-test-opens libssl.so.3 [.] state_machine ▒
+ 9.10% 0.00% curl-test-opens libcurl.so.4.8.0 [.] ossl_connect_step2 ▒
+ 9.09% 0.00% curl-test-opens libssl.so.3 [.] SSL_connect ▒
+ 9.09% 0.00% curl-test-opens libssl.so.3 [.] SSL_do_handshake ▒
+ 7.67% 0.00% curl-test-opens libssl.so.3 [.] read_state_machine ▒
+ 7.22% 0.00% curl-test-opens libcrypto.so.3 [.] ossl_sa_ALGORITHM_doall▒
+ 7.22% 0.00% curl-test-opens libcrypto.so.3 [.] ossl_sa_doall_arg ▒
- 7.21% 6.15% curl-test-opens libcrypto.so.3 [.] sa_doall ▒
- 6.15% __libc_start_call_main ▒
main ▒
curl_easy_perform ▒
easy_perform ▒
easy_transfer ▒
curl_multi_perform ▒
multi_runsingle ▒
Curl_conn_connect ▒
cf_hc_connect ▒
cf_hc_baller_connect ▒
Curl_conn_cf_connect ▒
cf_setup_connect ▒
Curl_conn_cf_connect ▒
ssl_cf_connect ▒
ssl_connect_nonblocking ▒
ossl_connect_nonblocking ▒
- ossl_connect_common ▒
- 3.82% ossl_connect_step1 ▒
SSL_CTX_new ▒
SSL_CTX_new_ex ▒
+ ssl_load_ciphers ▒
- 2.33% ossl_connect_step2 ▒
SSL_connect ▒
SSL_do_handshake ▒
ossl_statem_connect ▒
state_machine ▒
- read_state_machine ▒
- 1.73% ossl_statem_client_process_message ▒
+ 0.61% tls_process_server_certificate ▒
+ 0.56% tls_process_server_hello ▒
+ 0.55% tls_process_cert_verify ▒
+ 0.60% tls_get_message_header ▒
- 1.06% sa_doall ▒
- 0.71% impl_cache_flush_one_alg ▒
- 0.65% lh_QUERY_doall_IMPL_CACHE_FLUSH ▒
- OPENSSL_LH_doall_arg ▒
doall_util_fn ▒
+ 6.18% 0.14% curl-test-opens libcrypto.so.3 [.] evp_cipher_from_algorit▒
+ 6.16% 0.01% curl-test-opens libcrypto.so.3 [.] evp_names_do_all ▒
+ 6.00% 0.01% curl-test-opens libcrypto.so.3 [.] ossl_namemap_doall_name▒
+ 5.71% 0.00% curl-test-opens libcrypto.so.3 [.] ossl_method_store_cache▒
+ 5.68% 0.00% curl-test-opens libcrypto.so.3 [.] ossl_method_cache_flush▒
...
Re-using SSL_CTX
across HTTPS requests avoids some but not all of these
expensive calls. A quick and dirty (and likely broken) curl patch is included
which demonstrates this. The patched version is still about ~5ms slower than
v1.1.1w.
For our use case, upgrading to v3.1.3 seems like a sensible option. According to OpenSSL's versioning policy, it should be ABI compatible with v3.0.2.