Fast implementation of seekable AES-CTR in Go
Inspired by experiments in https://github.com/mmcloughlin/aesnix
This implementation supports passing arbitrary offset, which is useful to make IO in the middle of a file.
Implemented in ASM for amd64 and arm64. Speedup is based on running multiple AES instructions in a row to handle multiple blocks. I think, the speedup is caused by instruction pipelining.
Other architectures use slow implementation based on crypto/aes.
On my machines slow implementation provides ~500 megabytes per second and fast implementation — ~5000 megabytes.
The implementation is compatible with crypto/cipher.NewCTR. (They produce the same stream of bytes.) This is checked in the tests.
CPU | std.CTR | std.GCM/Seal | aesctrat.slow | aesctrat.fast | Speedup (std.CTR -> aesctrat.fast) |
---|---|---|---|---|---|
amd64-epyc | 421.19 | 2829.11 | 470.17 | 5443.96 | 12.9x |
amd64-ryzen5 | 906.12 | 4302.75 | 649.95 | 6119.56 | 6.8x |
arm64-ec2-t4g-small | 865.65 | 1698.73 | 341.89 | 2313.64 | 2.7x |
arm64-darwin-m1 | 1929.33 | 6546.48 | 768.86 | 7285.88 | 3.8x |
Raw output of go test -bench .
and go test -bench . crypto/aes crypto/cipher
in results dir.
import "github.com/starius/aesctrat"
key := make([]byte, 16) // Or 24 or 32.
ctr := aesctrat.NewAesCtr(key)
iv := make([]byte, 16)
offset := uint64(5) // Skip 5 bytes.
plaintext := make([]byte, 1000)
file.ReadAt(plaintext, offset)
ciphertext := make([]byte, 1000) // Must be of the same length as plaintext.
ctr.XORKeyStreamAt(ciphertext, plaintext, iv, offset)
Follow https://wiki.debian.org/QemuUserEmulation to install QEMU User Emulation on your Debian machine.
for arch in $(go tool dist list | grep linux | sed 's@linux/@@'); do
if GOARCH=$arch go test &> /tmp/$arch.log; then
echo PASS $arch;
else
echo FAIL $arch;
fi;
done
My results:
PASS 386
PASS amd64
PASS arm
PASS arm64
FAIL mips
PASS mips64
PASS mips64le
FAIL mipsle
PASS ppc64
PASS ppc64le
PASS riscv64
FAIL s390x
Failures:
head -2 /tmp/mips.log /tmp/mipsle.log /tmp/s390x.log
==> /tmp/mips.log <==
fatal error: float64nan
runtime: panic before malloc heap initialized
==> /tmp/mipsle.log <==
fatal error: float64nan
runtime: panic before malloc heap initialized
==> /tmp/s390x.log <==
signal: segmentation fault
FAIL github.com/starius/aesctrat 0.015s
Those architectures crashed with go test math/rand
so
I assume that those failures are not related to the package.