Feature Request: encryption primitives for devices without AES cpu instructions
DavyLandman opened this issue · 79 comments
Hi @rfjakob,
Thank you for this great application! The reverse mode is what really sets it apart from other options.
I checked the issues, and it doesn't seem to be discussed yet, but what do you think about adding support for a different collection of encryption primitives that are better suited for more low-end devices?
I'm running gocryptfs on a few ARMv6/7 based NAS machines, they are nice: low energy, and quite fast. But they lack native AES instructions, my fastest ARM device (Odroid XU4) maxes out at 40MB/s, while for example the raspberry-pi's and friends are quite a bit slower (rpi1 is at 15MB/s).
Maybe Google Adiantum (also added to linux kernel 5.0 for cryptfs) is a nice fit, Adiantum is based on XChaCha12 and Poly1305 and is roughly 5 quicker than AES-XTS for devices without AES instructions.
For the reverse mode maybe something based on ChaCha20Poly1305?
Just for comparison, on my Odroid XU4, ChaCha20Poly1305 runs at 320MB/s, on my RPi1 it gets close to 40MB/s.
So I'm just wondering what your view is on this topic.
Cheers,
Davy
Hi, would you mind running gocryptfs -speed
on your ARM machines and posting the result? (and cat /proc/cpuinfo | grep -E "model name|flags" | head -2
).
I'd like to add it to our CPU zoo at ( https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks )
I've taken all different kind of ARM devices I have:
Odroid XU4 (Exynos 5422 - ARM Cortex-A15 - 2 GHz)
model name : ARMv7 Processor rev 3 (v7l)
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
$ gocryptfs -speed
AES-GCM-256-OpenSSL 34.26 MB/s (selected in auto mode)
AES-GCM-256-Go 17.24 MB/s
AES-SIV-512-Go 17.58 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
chacha20-poly1305 64066.72k 130153.44k 275532.80k 306572.84k 320018.56k 307903.74k
aes-256-gcm 40323.87k 49980.74k 64734.47k 70323.03k 71862.66k 71786.19k
Raspberry Pi 3 B rev 1.2 (BCM2835 - ARM Cortex-A53 - 1.2Ghz)
model name : ARMv7 Processor rev 4 (v7l)
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
$ gocryptfs -speed
AES-GCM-256-OpenSSL 17.13 MB/s (selected in auto mode)
AES-GCM-256-Go 5.27 MB/s
AES-SIV-512-Go 4.31 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
chacha20-poly1305 30020.39k 63560.13k 77169.32k 82019.33k 83536.55k 83645.78k
aes-256-gcm 16137.38k 19500.97k 20668.33k 20986.20k 21127.17k 21135.36k
Raspberry Pi B rev 2 (BCM2835 - ARM 11 - 700Mhz)
model name : ARMv6-compatible processor rev 7 (v6l)
Features : half thumb fastmult vfp edsp java tls
$ gocryptfs -speed
AES-GCM-256-OpenSSL 4.80 MB/s (selected in auto mode)
AES-GCM-256-Go 1.85 MB/s
AES-SIV-512-Go 1.50 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
chacha20-poly1305 8090.97k 18202.65k 23222.03k 24960.34k 25666.44k 24958.29k
aes-256-gcm 4525.91k 6268.65k 6972.36k 7141.38k 7230.33k 7150.88k
Awesome, thanks! Added to https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks .
I have added an XChaCha20-Poly1305 benchmark to gocryptfs -speed
in the xchacha20
branch. On my PC, the results look very promising, with xchacha20 being almost as fast as hardware-accelerated AES-GCM:
$ gocryptfs -speed
AES-GCM-256-OpenSSL 585.92 MB/s
AES-GCM-256-Go 899.28 MB/s (selected in auto mode)
AES-SIV-512-Go 164.05 MB/s
XChaCha20-Poly1305-Go 773.27 MB/s
HOWEVER, looking at https://github.com/golang/crypto/tree/master/chacha20poly1305 , there only seems to an optimized assembly version for amd64 (xxx_amd64.s).
Could you run gocryptfs -speed
from the xchacha20 branch on one of your ARM devices, so see how the fast Go implementation is there?
EDIT: But there is a chacha_arm64.s here: https://github.com/golang/crypto/tree/master/chacha20
I have compiled that branch for Armv7, binary: gocryptfs.xchacha20.armv7.tar.gz
Thanks for the binary:
on the Odroid XU4:
$ ./gocryptfs.xchacha20.armv7 --speed
AES-GCM-256-OpenSSL N/A
AES-GCM-256-Go 17.04 MB/s (selected in auto mode)
AES-SIV-512-Go 14.79 MB/s
XChaCha20-Poly1305-Go 23.37 MB/s
$ gocryptfs --speed
AES-GCM-256-OpenSSL 41.12 MB/s (selected in auto mode)
AES-GCM-256-Go 16.92 MB/s
AES-SIV-512-Go 19.10 MB/s
The other ARM devices I have to try later.
Pitty golang has not added asm chacha versions yet, maybe the same openssl bridge for speed?
I had the same idea, unfortunately, openssl does not have xchacha20 yet: openssl/openssl#5523
They do have chacha20, but this cannot be used with random nonces (too high risk of collisions)
that's a shame, could you add an option to also bench chacha20 case?
Just to get a sense of the impact of non-asm version, it might be that chacha20 is faster than xchacha20?
I'm reading a bit, and the size & message restrictions on chacha20 are not that bad right?
https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20.html
https://libsodium.gitbook.io/doc/advanced/stream_ciphers/chacha20
The table on https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20.html is very nice!
The problem with ChaCha20: Max 200 000 messages
. In gocryptfs, one "message" is a 4kiB data block, so that's a limit of 800 GiB data written over the lifetime of the filesystem!
The normal one in go (and I think also openssl) is the second row in that table.
Hi, I previously ported Gocryptfs to use wolfSSL. Does the code below allow the use of a random nonce with ChaCha20
?
https://github.com/wolfSSL/wolfssl/blob/master/wolfcrypt/src/chacha.c#L111
@DavyLandman I see, 96 bit nonces, that's less bad. gocryptfs used 96 bit nonces in earlier versions. I moved to 128 bits because 96 bit it too little for very large filesystems, I have the calculations saved in #17 (comment) .
And also, https://pkg.go.dev/golang.org/x/crypto/chacha20poly1305 says,
XChaCha20-Poly1305 is a ChaCha20-Poly1305 variant that takes a longer nonce, suitable to be generated randomly without risk of collisions. It should be preferred when nonce uniqueness cannot be trivially ensured, or whenever nonces are randomly generated.
so I'd rather not go with ChaCha20.
@lechner Yes it does, but only 96 bits according to the function comment
this version uses the typical AEAD 96 bit nonce
@DavyLandman I see, 96 bit nonces, that's less bad. gocryptfs used 96 bit nonces in earlier versions. I moved to 128 bits because 96 bit it too little for very large filesystems, I have the calculations saved in #17 (comment) .
I was just reading the RFC5379, and it specifically notes that a random nonce is not needed, just as long as it is unique, a simple counter is just as secure.
The most important security consideration in implementing this
document is the uniqueness of the nonce used in ChaCha20. Counters
and LFSRs are both acceptable ways of generating unique nonces
Also discussed on Crypto SE.
Assuming 4KiB sectors, you would have to write (2^96 * 4 KiB) bytes before this counter overflows. Which is after 324.518.554 yottabytes. That should be good enough right ? ;)
Was reading SE and per chance a relevant question popped up: https://crypto.stackexchange.com/questions/77982/how-to-generate-a-nonce-for-chacha20-poly1305
Using a counter as the nonce would be nice, unfortunately, I don't think we can. There may be multiple gocryptfs processes writing to the folder at the same time (use case: encrypted folder on shared network drive).
I have added the gocryptfs.xchacha20.armv7 results to https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks .
I'm afraid using XChaCha20-Poly1305-Go does not make sense, as it is slower than AES-GCM-256-OpenSSL.
We can revisit when openssl gets XChaCha20.
Actually, on a Raspberry Pi 4 with Ubuntu 64 bit, things look differently:
$ ./gocryptfs -speed
AES-GCM-256-OpenSSL 21.50 MB/s (selected in auto mode)
AES-GCM-256-Go 21.75 MB/s
AES-SIV-512-Go 17.64 MB/s
XChaCha20-Poly1305-Go 109.78 MB/s
I just ran it on my rpi3:
$ ./gocryptfs.xchacha20.armv7 --speed
AES-GCM-256-OpenSSL N/A
AES-GCM-256-Go 4.86 MB/s (selected in auto mode)
AES-SIV-512-Go 4.53 MB/s
XChaCha20-Poly1305-Go 9.26 MB/s
$ gocryptfs --speed
AES-GCM-256-OpenSSL 16.83 MB/s (selected in auto mode)
AES-GCM-256-Go 5.24 MB/s
AES-SIV-512-Go 4.20 MB/s
Needs a 64 bit gocryptfs to be fast. Go has optimized xchacha assembly for arm64.
Ah, yes, okay so it's for the zoo then ;)
With quite some work you could link/cgo these asm versions: https://github.com/floodyberry/chacha-opt/tree/master/app/extensions/chacha
arm64 binary:
gocryptfs.xchacha20.arm64.tar.gz
It might be nice to get a armv6/v7 native version into golang. That would open up quite a range of devices.
Relevant issue: golang/go#22809
A comparison of two versions to characterize performance on the same arm64 hardware. The device under test is pre-production hardware from a product that didn't make it to market, so the interesting thing is not its performance, but the relative perf of two versions.
Ubuntu 18.04 LTS with apt install gocryptfs
AES-GCM-256-OpenSSL 241.04 MB/s (selected in auto mode)
AES-GCM-256-Go 38.06 MB/s
AES-SIV-512-Go 28.61 MB/s
gocryptfs 1.4.3; go-fuse 0.0~git20171124.0.14c3015; 2018-02-05 go1.9.3
Same hardware with a fresh build of gocryptfs on a fresh copy of Go
AES-GCM-256-OpenSSL 216.25 MB/s (selected in auto mode)
AES-GCM-256-Go 450.68 MB/s
AES-SIV-512-Go 100.51 MB/s
gocryptfs v1.7.1-37-g75f1677; go-fuse v2.0.3; 2020-04-13 go1.14.2 linux/arm64
I would hope that whoever is packaging gocryptfs for Ubuntu 20.04 LTS is using a sufficiently modern version to pick up all of Go's perf improvements.
@vielmetti Hi, you can see from your output that gocryptfs
in Ubuntu 18.04 LTS was built with the much older version 1.9 of golang-go
. They probably improved encryption speeds when AES
is not available. That shows up in 1.14.
This does not look like a packaging issue to me. Please file a bug against your gocryptfs
package if you think otherwise.
I maintain gocryptfs
in Debian, which is where I believe Ubuntu gets the package.
Thanks @lechner - do you happen to know which Go version will land in 20.04 LTS? Hoping that it's new enough to pick up a bunch of improvements.
@vielmetti For a definitive answer, you would have to ask the Ubuntu release team. It looks like 1.13.
You can always see what's in Debian here.
@vielmetti interesting, thanks for the numbers. Pretty fast GCM, looks like it's hardware-accelerated.
Could you "git pull" and run gocryptfs -speed
again? I have enabled the xchacha20 benchmark in the master branch now.
root@q1:~/go/src/github.com/rfjakob/gocryptfs# ./gocryptfs -speed
gocryptfs v1.7.1-46-g73436d9; go-fuse v1.0.1-0.20190319092520-161a16484456; 2020-04-13 go1.14.2 linux/arm64
AES-GCM-256-OpenSSL 212.30 MB/s (selected in auto mode)
AES-GCM-256-Go 452.30 MB/s
AES-SIV-512-Go 100.25 MB/s
XChaCha20-Poly1305-Go 137.35 MB/s
Thanks, very interesting. Looks like
(1) the ARMv8 crypto extensions beat the socks off XChaCha20-Poly1305
(2) gocryptfs needs to learn to prefer AES-GCM-256-Go when the CPU has it
I just want to bring back a single point, I proposed chacha20-poy1305 for devices that do not have crypto-extensions, so armv8 devices are not part of that bunch.
But there are a lot of armv6 & armv7 devices out that, that might still benefit from either asm/go tuned versions of chacha20 or a link to a native openssl version of it.
btw I run the gocryptfs on OrangePi One (32bit, sun8i, 4-core, 1 GHz, AllWinner H3 SoC) and it would be nice to speed up a bit as on best it gets to 12MBps which is IMHO similar to RaspberryPI2, but when I tested it with chacha20-poy1305 binary, it got only half the speed which is not what everyone else here is reporting:
... kernel ... 5.4.45-sunxi ...
model name : ARMv7 Processor rev 5 (v7l)
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
...
$ gocryptfs -speed
AES-GCM-256-OpenSSL 12.65 MB/s (selected in auto mode)
AES-GCM-256-Go 3.64 MB/s
AES-SIV-512-Go 3.08 MB/s
compared to custom binary:
$ ./gocryptfs.xchacha20.armv7 -speed
AES-GCM-256-OpenSSL N/A
AES-GCM-256-Go 3.26 MB/s (selected in auto mode)
AES-SIV-512-Go 3.06 MB/s
XChaCha20-Poly1305-Go 6.80 MB/s
my (old) intel pc could really speed up with XChaCha20-Poly1305-Go:
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority dtherm
$gocryptfs -speed
gocryptfs v1.8.0-35-g274e0d2; go-fuse v2.0.3; 2020-06-01 go1.14.3 linux/amd64
AES-GCM-256-OpenSSL 94.85 MB/s (selected in auto mode)
AES-GCM-256-Go 40.20 MB/s
AES-SIV-512-Go 32.05 MB/s
XChaCha20-Poly1305-Go 328.54 MB/s
~$ lscpu | grep -E 'Arch|Model |Flags'
Architecture: x86_64
Model name: Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
~$ gocryptfs -speed
gocryptfs v1.8.0.HEAD; go-fuse v1.0.1-0.20190319092520-161a16484456; 2020-07-13 go1.14.4 linux/amd64
AES-GCM-256-OpenSSL 132.23 MB/s (selected in auto mode)
AES-GCM-256-Go 40.79 MB/s
AES-SIV-512-Go 31.38 MB/s
XChaCha20-Poly1305-Go 412.11 MB/s
~$ lscpu | grep -E 'Arch|Model |Flags'
Architecture: aarch64
Model name: Cortex-A53
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
~$ gocryptfs -speed
gocryptfs v1.8.0.HEAD; go-fuse v1.0.1-0.20190319092520-161a16484456; 2020-07-13 go1.13.7 linux/arm64
AES-GCM-256-OpenSSL 32.41 MB/s
AES-GCM-256-Go 507.32 MB/s (selected in auto mode)
AES-SIV-512-Go 54.46 MB/s
XChaCha20-Poly1305-Go 141.12 MB/s
FWIW, here are results from an Intel Atom N2800 (common in Kimsufi low-end dedicated servers)
~$ lscpu | grep -E 'Arch|Model |Flags'
Architecture: x86_64
Model name: Intel(R) Atom(TM) CPU N2800 @ 1.86GHz
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat
~$ ./gocryptfs -speed
gocryptfs v1.8.0-39-g3b61244; go-fuse v2.0.3; 2020-07-29 go1.14 linux/amd64
AES-GCM-256-OpenSSL 15.53 MB/s (selected in auto mode)
AES-GCM-256-Go 10.58 MB/s
AES-SIV-512-Go 7.39 MB/s
XChaCha20-Poly1305-Go 78.46 MB/s
Clear win for XChaCha20-Poly1305-Go
Edit: I set up dm-crypt/luks instead on the same machine with same filesystem - getting 66MB/s from serpent xts 512, and takes ~1 min/~15 secs vs ~10 min/~80 secs to respectively untar and then delete linux kernel source. Real world perf for my use case is a similar 5-10x speedup. It seems like chacha support could go a long way to closing this performance gap
Hi there, is ther any option to force gocryptfs to use a cipher? I run benchmark for my machine and the result is
gocryptfs v2.0-beta3; go-fuse v2.1.1-0.20210423170155-a90e1f463c3f; 2021-04-28 go1.16.3 linux/amd64
AES-GCM-256-OpenSSL 28.06 MB/s (selected in auto mode)
AES-GCM-256-Go 22.37 MB/s
AES-SIV-512-Go 18.76 MB/s
XChaCha20-Poly1305-Go 143.61 MB/s
despite of being fastest, it still select the OpenSSL one, not sure why. And what to do if I want to force it to use Xchacha20?
Thanks
never mind I found a smilar question and got answer. Ta
Is there a timeline available when this fast option of XChaCha20-Poly1305-Go is available?
I saw that there is a draft of the RFC, so it gets implemented into OpenSSL once it s final, but know clue how long such things take. (Item gets revisted once it is in there, was mentioned earlier here)
I would like to go with the fastest option on Raspi4
@sunshine69 I have added a note to the output now that XChaCha20 is only implemented for the benchmark at the moment:
$ ./gocryptfs -speed
gocryptfs v2.0-beta4-dirty; go-fuse v2.1.1-0.20210423170155-a90e1f463c3f => github.com/rfjakob/go-fuse/v2 v2.1.1-0.20210508151621-62c5aa1919a7; 2021-05-18 go1.16.2 linux/amd64
AES-GCM-256-OpenSSL 539.12 MB/s
AES-GCM-256-Go 827.10 MB/s (selected in auto mode)
AES-SIV-512-Go 154.94 MB/s
XChaCha20-Poly1305-Go 701.98 MB/s (benchmark only, not selectable yet)
Is this viable for bounty, or does that not make sense since it has major external dependency?
It is viable, and it does not have an external dependency anymore (it's in Go stdlib now)!
I have pushed XChaCha20-Poly1305 support to master. Please test! For example via
./benchmark.bash -xchacha
Or manually
gocryptfs -init -xchacha
Results for my raspberry pi4: https://gist.github.com/rfjakob/b28383f4c84263ac7c5388ccc262e38b
Tested on RPI4 running Ubuntu
me@pi:~/gocryptfs$ ./benchmark.bash
Testing gocryptfs at /tmp/benchmark.bash.MTU: gocryptfs v2.1-35-g61ef6b0 without_openssl; go-fuse v2.1.1-0.20210825070001-74a933d6e856; 2021-08-25 go1.17 linux/arm64
/tmp/benchmark.bash.MTU.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 9.8869 s, 26.5 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 7.01108 s, 37.4 MB/s
UNTAR: 76.747
MD5: 39.081
LS: 13.652
RM: 15.176
me@pi:~/gocryptfs$ ./benchmark.bash -xchacha
Testing gocryptfs -xchacha at /tmp/benchmark.bash.eVU: gocryptfs v2.1-35-g61ef6b0 without_openssl; go-fuse v2.1.1-0.20210825070001-74a933d6e856; 2021-08-25 go1.17 linux/arm64
/tmp/benchmark.bash.eVU.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 4.67989 s, 56.0 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 2.13098 s, 123 MB/s
UNTAR: 54.788
MD5: 24.462
LS: 14.134
RM: 15.098
me@pi:~/gocryptfs$ gocryptfs -speed
gocryptfs v2.1-35-g61ef6b0 without_openssl; go-fuse v2.1.1-0.20210825070001-74a933d6e856; 2021-08-25 go1.17 linux/arm64
AES-GCM-256-OpenSSL N/A
AES-GCM-256-Go 22.34 MB/s (selected in auto mode)
AES-SIV-512-Go 18.67 MB/s
XChaCha20-Poly1305-Go 112.64 MB/s (use via -xchacha flag)
I also compared "real world" performance on the same raspberry pi with a ZFS mirror (2x HDD on USB 3.0 adaptors with UASP), using rsync to pull backup ISOs from another NAS over ethernet.
- With a local rsync target of a ZFS native encrypted data set on the pi, rsync averaged about 20 MBps, with CPU maxed out.
- Replacing the ZFS native encryption with gocryptfs (with xchacha), i.e an unencrypted ZFS dataset, rsync averaged about 31 MBps with some CPU to spare.
Impressive. Would you consider this relatively safe for use?
@giraffe2k the second measurement roughly 30MBps sounds like you might be hitting an USB 2.0 limit? could you check the raw speed without any encryption for the same operation?
still, nice results. I'll run the benchmarks on some of my machines as well.
@DavyLandman it's silly use-case to test - ZFS mirror on an RPI4 - though I find it a handy backup. I verified it was using USB 3.0:
me@pi:~$ lsusb
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 002: ID 14b0:0200 StarTech.com Ltd. ASM135x
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
[...]
me@pi:~$ lsusb -t
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
|__ Port 2: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
[...]
Then tried as you suggested; a plain rsync pull without any encryption (ZFS or gocryptfs). The results were 45 MBps.
Next I wondered if ZFS was slowing it down, so I tried just a plain old ext4 HDD on the same StarTech USB 3.0 adapter. Without encryption was 45 MBps, with gocryptfs xchacha it was 28 MBps.
Not sure I can put a sensible narrative on all that. But I'm very happy with the xchacha performance boost.
Most likely the benchmark is a bit influenced by caches of the HDD. Either try on a in mem filesystem, or otherwise in a SSD. But still, nice results showing a acceptable overhead.
Where did you get the binary from? Just compile from source?
arm64 binary for convencience: gocryptfs.gz
Version info: gocryptfs v2.1-37-g91d3b30 without_openssl; go-fuse v2.1.1-0.20210825070001-74a933d6e856; 2021-08-27 go1.17 linux/arm64
Would you consider this relatively safe for use?
Yes. Passes the gocryptfs test suite, fsstress, and now also has an independet python implementation that did not uncover any problems.
arm64 binary for convencience: gocryptfs.gz
If you could also provide an armv7 for convenience? Than I'll be quicker in running some benchmarks in-between stuff.
armv7: gocryptfs.gz
Version info: gocryptfs v2.1-37-g91d3b30 without_openssl; go-fuse v2.1.1-0.20210825070001-74a933d6e856; 2021-08-27 go1.17 linux/arm
Here's the results of the benchmark script on an old intel CPU (an i3 CPU 540 that has no AES-NI):
Testing gocryptfs at /dev/shm//benchmark.bash.eUE: gocryptfs v2.1-38-ge69a857; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-29 go1.16.6 linux/amd64
/dev/shm//benchmark.bash.eUE.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 4.4813 s, 58.5 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 3.79049 s, 69.2 MB/s
UNTAR: 34.258
MD5: 20.922
LS: 4.256
RM: 6.174
versus:
Testing gocryptfs -xchacha at /dev/shm//benchmark.bash.ubh: gocryptfs v2.1-38-ge69a857; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-29 go1.16.6 linux/amd64
/dev/shm//benchmark.bash.ubh.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 1.7571 s, 149 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 1.02403 s, 256 MB/s
UNTAR: 23.168
MD5: 13.308
LS: 4.335
RM: 6.146
(/dev/shm/
is tmpfs
)
So -xchacha
is a clear winner here too.
Also:
gocryptfs v2.1-38-ge69a857; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-29 go1.16.6 linux/amd64
AES-GCM-256-OpenSSL 43.77 MB/s (selected in auto mode)
AES-GCM-256-Go 22.20 MB/s
AES-SIV-512-Go 18.06 MB/s
XChaCha20-Poly1305-Go 196.78 MB/s (use via -xchacha flag)
xu4 (armv7, running on tmpfs)
normal:
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 14.4193 s, 18.2 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 9.40416 s, 27.9 MB/s
UNTAR: 108.232
MD5: 66.366
LS: 13.074
RM: 17.661
xchacha:
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 9.28123 s, 28.2 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 6.03132 s, 43.5 MB/s
UNTAR: 87.695
MD5: 46.046
LS: 12.658
RM: 18.049
Just tested it on armv7l, orange pi one, It is now the fastest from the go-implementations, but what is bit interesting is that openssl is still fastest. I've built the binary myself from git sources. btw for some reason the compiled binary was named "v2" instead of "gocryptfs", but I haven't figured out why, maybe the old go version? Anyway, this is bit outdated hardware now, so no miracles are expected.
gocryptfs v2.1-44-g4e3b770-dirty; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.11.6 linux/arm
AES-GCM-256-OpenSSL 14.76 MB/s (selected in auto mode)
AES-GCM-256-Go 4.11 MB/s
AES-SIV-512-Go 3.51 MB/s
XChaCha20-Poly1305-Go 8.61 MB/s (use via -xchacha flag)
the benchmark won't fit into tmpfs, so it runs from sd card (-xchacha):
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 62.7842 s, 4.2 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 29.1741 s, 9.0 MB/s
UNTAR: 476.098
MD5: 145.932
LS: 24.879
RM: 26.344
The old go compiler may hurt you also in performance, can you see if the binary i posted above gives better results?
I've though the same so I've now tested using go 1.15 and results seems better:
$ ./gocryptfs -speed
gocryptfs v2.1-44-g4e3b770; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.15.9 linux/arm
AES-GCM-256-OpenSSL 14.42 MB/s (selected in auto mode)
AES-GCM-256-Go 3.72 MB/s
AES-SIV-512-Go 3.46 MB/s
XChaCha20-Poly1305-Go 11.39 MB/s (use via -xchacha flag)
and this is using downloaded binary (go 1.17):
$ ./gocryptfs -speed
gocryptfs v2.1-37-g91d3b30 without_openssl; go-fuse v2.1.1-0.20210825070001-74a933d6e856; 2021-08-27 go1.17 linux/arm
AES-GCM-256-OpenSSL N/A
AES-GCM-256-Go 4.29 MB/s (selected in auto mode)
AES-SIV-512-Go 3.71 MB/s
XChaCha20-Poly1305-Go 11.43 MB/s (use via -xchacha flag)
@DavyLandman did the "normal" run have OpenSSL support? (the binaries I posted do not). Also, revisiting this:
I just want to bring back a single point, I proposed chacha20-poy1305 for devices that do not have crypto-extensions, so armv8 devices are not part of that bunch.
I read through the benchmarks in this ticket and in the wiki again, and, unfortunately, 32-bit ARM (armv7) devices don't gain anything with this iteration of xchacha support. On 32-bit ARM, AES-GCM-256-OpenSSL is faster than XChaCha20-Poly1305-Go, because OpenSSL has optimized assembly, and Go does not.
Using -xchacha
now makes sense on:
- amd64 (=Intel/AMD 64 bit) CPUs that lack AES acceleration. These are mostly older and low power CPUs.
- arm64 (=ARM 64 bit) CPUs that lack AES acceleration. That's most of them.
On these, however, something else will be faster:
- amd64 with AES accelerationn: AES-GCM-256-Go
- armv7: AES-GCM-256-OpenSSL
So, I just got out my odroid n2 (which has a very beafy arm64 with crypto extensions:)
odroidn2:~:# cat /proc/cpuinfo
processor : 0
model name : ARMv8 Processor rev 4 (v8l)
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
./gocryptfs -speed
gocryptfs v2.1-45-gc505e73; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.
16.2 linux/arm64
AES-GCM-256-OpenSSL 282.90 MB/s
AES-GCM-256-Go 580.28 MB/s (selected in auto mode)
AES-SIV-512-Go 88.85 MB/s
XChaCha20-Poly1305-Go 188.07 MB/s (use via -xchacha flag)
benchmark:
Testing gocryptfs at /tmp/benchmark.bash.cJW: gocryptfs v2.1-45-gc505e73; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.16.2 lin
ux/arm64
/tmp/benchmark.bash.cJW.mnt is a mountpoint
Downloading linux-3.0.tar.gz
/tmp/linux-3.0.tar.gz 100%[===================================================================>] 92.20M 24.1MB/s in 4.2s
2021-08-30 20:44:52 URL:https://cdn.kernel.org/pub/linux/kernel/v3.0/linux-3.0.tar.gz [96675825/96675825] -> "/tmp/linux-3.0.tar.gz" [1]
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 1.64059 s, 160 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 0.711058 s, 369 MB/s
UNTAR: 23.326
MD5: 8.565
LS: 3.315
RM: 4.802
root@odroidn2:/tmp/gocryptfs# ./benchmark.bash -xchacha
Testing gocryptfs -xchacha at /tmp/benchmark.bash.P0M: gocryptfs v2.1-45-gc505e73; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.
16.2 linux/arm64
/tmp/benchmark.bash.P0M.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 2.26622 s, 116 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 1.32793 s, 197 MB/s
UNTAR: 25.113
MD5: 11.183
LS: 3.027
@DavyLandman did the "normal" run have OpenSSL support? (the binaries I posted do not). Also, revisiting this:
here is the one that came with the distro (and has openssl enabled).
Testing gocryptfs at /tmp/benchmark.bash.O7d: gocryptfs 1.6.1; go-fuse 0.0~git20190214.58dcd77; 2019-03-11 go1.11.5
/tmp/benchmark.bash.O7d.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 7.2748 s, 36.0 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 4.49824 s, 58.3 MB/s
UNTAR: 82.685
MD5: 43.736
LS: 5.387
RM: 17.843
so indeed, for armv7, not an improvement.
Dear armv7 users, I have something brewing in the "stupidchacha" branch. Could somebody build it on armv7:
git clone https://github.com/rfjakob/gocryptfs.git
cd gocryptfs
git checkout stupidchacha
./build.bash # yes, must be with openssl
And then run
./gocryptfs -speed
?
Nice one 👏🏼 @rfjakob looks like openssl contains arm optimized xchacha indeed:
gocryptfs v2.1-57-g54e56ab.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-02 go1.16.7 linux/arm
AES-GCM-256-OpenSSL 41.85 MB/s (selected in auto mode)
AES-GCM-256-Go 15.87 MB/s
AES-SIV-512-Go 16.52 MB/s
XChaCha20-Poly1305-Go 33.77 MB/s (use via -xchacha flag)
XChaCha20-Poly1305-OpenSSL 75.68 MB/s
(on the odroid xu4)
Ok, not bad! Thanks!
However, the "openssl speed" number you posted for the xu4 show 306MB/s for blocksize 1024 (gocryptfs uses 4k blocks, 1k should be comparable).
In other words, we lose a factor of 4 somewhere?
Ok, not bad! Thanks!
However, the "openssl speed" number you posted for the xu4 show 306MB/s for blocksize 1024 (gocryptfs uses 4k blocks, 1k should be comparable).
In other words, we lose a factor of 4 somewhere?
Sorry for the confusion, that was the n2 (armv8 with AES extensions).
the xu4 reported this in the benchmark:
Testing gocryptfs at /tmp/benchmark.bash.O7d: gocryptfs 1.6.1; go-fuse 0.0~git20190214.58dcd77; 2019-03-11 go1.11.5
/tmp/benchmark.bash.O7d.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 7.2748 s, 36.0 MB/s
READ: 262144000 bytes (262 MB, 250 MiB) copied, 4.49824 s, 58.3 MB/s
UNTAR: 82.685
MD5: 43.736
LS: 5.387
RM: 17.843
(update scratch this comment, I'm mixing stuff)
PS: openssl does not have xchacha. In "XChaCha20-Poly1305-OpenSSL" , the "X" is from the Go crypto library and "ChaCha20-Poly1305" is from openssl. So it's expected to be somewhat slower than straight openssl chacha20-poly1305.
ah, so you run the first block manually? and then give it over to openssl to continue?
The 306 MB/s was from #452 (comment)
ah, true, just ran it again, and indeed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
chacha20-poly1305 66557.18k 125680.49k 275370.18k 302992.35k 302139.10k 310047.96k
Is this marshalling overhead for cgo? If I remember correclty there are some very specific ways to use c libraries in go to avoid memory copying? But my cgo is a bit rusty currently.
@rfjakob if you make a version on the branch that is just purely piping chacha20-poly1305 from openssl (so removing the X part), we could check what happens there? I'd be happy to compile and run the -speed
on the xu4 again.
Dear armv7 users, I have something brewing in the "stupidchacha" branch. Could somebody build it on armv7:
git clone https://github.com/rfjakob/gocryptfs.git cd gocryptfs git checkout stupidchacha ./build.bash # yes, must be with openssl
And then run
./gocryptfs -speed
?
I've got some troubles building it/running against openssl 1.1.1d, but once updated to 1.1.1k it went fine, and the speed benefit is clearly visible:
$ ./gocryptfs -speed
gocryptfs v2.1-57-g54e56ab.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-03 g
o1.15.9 linux/arm
AES-GCM-256-OpenSSL 12.99 MB/s (selected in auto mode)
AES-GCM-256-Go 3.97 MB/s
AES-SIV-512-Go 3.44 MB/s
XChaCha20-Poly1305-Go 11.33 MB/s (use via -xchacha flag)
XChaCha20-Poly1305-OpenSSL 36.76 MB/s
If you "git pull" now, you should see double-digit % improvements
If you "git pull" now, you should see double-digit % improvements
gocryptfs v2.1-68-gedf9d4c.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-04 go1.16.7 linux/arm
AES-GCM-256-OpenSSL 56.84 MB/s (selected in auto mode)
AES-GCM-256-Go 16.61 MB/s
AES-SIV-512-Go 16.49 MB/s
XChaCha20-Poly1305-Go 39.08 MB/s (use via -xchacha flag)
XChaCha20-Poly1305-OpenSSL 141.82 MB/s
still no 300MB/s but quite an improvement indeed. Looking at the commits, it's all about cgo overhead? :( although these insights might also improve the AES-GCM via OpenSSSL performance?
in case you are interested:
/app/internal/speed # go test -bench .
goos: linux
goarch: arm
pkg: github.com/rfjakob/gocryptfs/v2/internal/speed
BenchmarkStupidGCM-8 17583 71483 ns/op 57.30 MB/s
BenchmarkStupidGCMDecrypt-8 17916 66884 ns/op 61.24 MB/s
BenchmarkGoGCM-8 5727 215568 ns/op 19.00 MB/s
BenchmarkGoGCMDecrypt-8 5780 205670 ns/op 19.92 MB/s
BenchmarkAESSIV-8 5294 236888 ns/op 17.29 MB/s
BenchmarkAESSIVDecrypt-8 5380 226851 ns/op 18.06 MB/s
BenchmarkXchacha-8 10000 101972 ns/op 40.17 MB/s
BenchmarkXchachaDecrypt-8 11912 99798 ns/op 41.04 MB/s
BenchmarkStupidXchacha-8 43904 30586 ns/op 133.92 MB/s
BenchmarkStupidXchachaDecrypt-8 38722 27773 ns/op 147.48 MB/s
BenchmarkStupidChacha-8 49257 26900 ns/op 152.27 MB/s
BenchmarkStupidChachaDecrypt-8 51540 25752 ns/op 159.05 MB/s
PASS
ok github.com/rfjakob/gocryptfs/v2/internal/speed 20.293s
all on the trusty old xu4 ;_
You could also consider either porting the arm specific asm from openssl, or trying to get the golang team to take up the assembly versions of chacha20-poly1305?
Here is the source: https://github.com/openssl/openssl/blob/master/crypto/chacha/asm/chacha-armv4.pl
interestingly it works for armv4+.
it's all about cgo overhead? :( although these insights might also improve the AES-GCM via OpenSSSL performance?
Yes, it's mostly C call overhead ( https://www.cockroachlabs.com/blog/the-cost-and-complexity-of-cgo/ ). And the improvement is to call only once into C and do all needed openssl calls there ( b3e5ed8 ).
Yes, AES-GCM sees an improvement as well ( commit 275ebc1 ):
I managed to get an 32-bit arm docker container running on my rpi4, branch stupidchacha (currently at edf9d4c):
root@f13b37d6334c:~/gocryptfs/internal/speed# go test -bench .
goos: linux
goarch: arm
pkg: github.com/rfjakob/gocryptfs/v2/internal/speed
BenchmarkStupidGCM-4 14812 80181 ns/op 51.08 MB/s
BenchmarkStupidGCMDecrypt-4 14978 79943 ns/op 51.24 MB/s
BenchmarkGoGCM-4 4616 233316 ns/op 17.56 MB/s
BenchmarkGoGCMDecrypt-4 4884 232717 ns/op 17.60 MB/s
BenchmarkAESSIV-4 4827 242162 ns/op 16.91 MB/s
BenchmarkAESSIVDecrypt-4 4678 241086 ns/op 16.99 MB/s
BenchmarkXchacha-4 10000 108352 ns/op 37.80 MB/s
BenchmarkXchachaDecrypt-4 10000 108356 ns/op 37.80 MB/s
BenchmarkStupidXchacha-4 49172 23936 ns/op 171.13 MB/s
BenchmarkStupidXchachaDecrypt-4 49736 24128 ns/op 169.76 MB/s
BenchmarkStupidChacha-4 57219 20778 ns/op 197.13 MB/s
BenchmarkStupidChachaDecrypt-4 57183 20882 ns/op 196.15 MB/s
PASS
ok github.com/rfjakob/gocryptfs/v2/internal/speed 16.650s
Current master without the changes:
root@f13b37d6334c:~/gocryptfs/internal/speed# git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
root@f13b37d6334c:~/gocryptfs/internal/speed# go test -bench .
goos: linux
goarch: arm
pkg: github.com/rfjakob/gocryptfs/v2/internal/speed
BenchmarkStupidGCM-4 9729 109484 ns/op 37.41 MB/s
BenchmarkGoGCM-4 4522 239585 ns/op 17.10 MB/s
BenchmarkAESSIV-4 4574 250865 ns/op 16.33 MB/s
PASS
ok github.com/rfjakob/gocryptfs/v2/internal/speed 3.395s
I'll also attach the cpu profiles for later reference.
You could also consider either porting the arm specific asm from openssl, or trying to get the golang team to take up the assembly versions of chacha20-poly1305?
Now that I am used to Go, writing C code already feels like juggling chainsaws. I will not touch asm :)
But looking at the xchacha.pdf cpu profile, the Go parts runs really fast and does not seem to slow us down (HChaCha20).
BTW how XChaCha20-Poly1305-OpenSSL works is this: The HChaCha20 function (from Go stdlib) mixes key and nonce to get a new key for each encryption, which is normal ChaCha20-Poly1305, so we can call OpenSSL at this point:
key, nonce -> Go HChaCha20 -> key2, nonce2 -> OpenSSL ChaCha20-Poly1305
Now that I am used to Go, writing C code already feels like juggling chainsaws. I will not touch asm :)
That seems wise
But looking at the xchacha.pdf cpu profile, the Go parts runs really fast and does not seem to slow us down (HChaCha20).
Indeed, quite optimal. a pitty about the overhead for cgo. but still, much better then where we started.
BTW how XChaCha20-Poly1305-OpenSSL works is this: The HChaCha20 function (from Go stdlib) mixes key and nonce to get a new key for each encryption, which is normal ChaCha20-Poly1305, so we can call OpenSSL at this point:
key, nonce -> Go HChaCha20 -> key2, nonce2 -> OpenSSL ChaCha20-Poly1305
Thanks for the refresher 👍🏼 (and also, creative solution 👏🏼 )
gocryptfs v2.2.0 has been released, this is done.