brainstorm: A C++ repository from blu

Description

This is an optimising interpreter for the language Brainfuck with a small twist: brainstorm's output operation can be configured to print only numbers - the numeric value of the memory cell. In other words, brainstorm is oriented toward complex computational problems.

Other than that brainstorm does two very basic optimisations on the code: homogeneous data- and instruction-pointer-move instruction sequences are collapsed to one instruction. Instructions themselves are stored as 16-bit words, containing an optional immediate-operand field.

How to Build

Running the build.sh script in the source directiory builds the executable, if you happen to have clang++-3.6 (hardcoded in the script -- substitute for your preferred compiler).

There are three versions of the interpreter -- vanilla, alt and alt-alt. They differ by minor tweaks to the interpeter loop, which can benefit some combinations of microarchitectures and compilers. The vanilla version is built by default; to build the alt version, pass _alt as an arg to the build script, and to build the alt-alt version, pass _alt_alt, respectively.

Benchmarks

Erik Bosman's mandelbrot generator (times include printout; 'alt' = alt version, 'alt^2' = alt-alt version):

CPU	compiler	time (real)
AMD C-60 (Bobcat) @ 1.33GHz	clang++-3.5.2	0m36.497s ¹
AMD C-60 (Bobcat) @ 1.33GHz	g++-4.8.4	0m34.668s ¹
AMD A8-7600 (Steamroller) @ 2.4GHz	clang++-3.5.2	0m16.720s
AMD Ryzen 1700 (Zen) @ 3.0GHz	g++-6.3.0	0m9.54s
AMD Ryzen 1700 (Zen) @ 3.168GHz	g++-12.3.1	0m7.08s
AMD Ryzen 1700X (Zen) @ 2.2GHz	g++-4.9.2	0m11.81s ²
Intel Core2 Quad Q6600 (Kentsfield) @ 3.2GHz	clang++-3.7.x	0m12.650s
Intel Core2 Quad Q6600 (Kentsfield) @ 3.2GHz	g++-4.9.2	0m16.430s
Intel Core2 Duo P8600 (Penryn) @ 2.4GHz (alt^2)	apple clang++-8.1.0	0m13.383s
Intel i7-2600 (Sandy Bridge) @ 3.8GHz	clang++-3.7.1	0m9.200s
Intel i7-2600 (Sandy Bridge) @ 3.8GHz	g++-5.3.0	0m8.350s
Intel E5-2687W (Sandy Bridge) @ 3.1GHz	clang++-3.6.2	0m9.390s
Intel E5-2687W (Sandy Bridge) @ 3.1GHz	g++-4.7.2	0m9.946s
Intel i5-3470S (Ivy Bridge) @ 1.6GHz	g++-4.8.2	0m17.087s
Intel i5-3570 (Ivy Bridge) @ 1.6GHz	g++-4.8.4	0m17.086s
Intel i5-3570 (Ivy Bridge) @ 3.8GHz	g++-4.8.4	0m7.200s
Intel E3-1270v2 (Ivy Bridge) @ 1.6GHz	clang++-3.5.0	0m17.189s
Intel E3-1270v2 (Ivy Bridge) @ 1.6GHz	g++-4.8.2	0m17.088s
Intel E3-1270v2 (Ivy Bridge) @ 3.9GHz	g++-4.8.2	0m7.033s
Intel i7-5820K (Haswell) @ 3.6GHz	clang++-3.6.x	0m7.276s
Intel i7-5820K (Haswell) @ 3.6GHz	g++-4.8.x	0m6.905s
Freescale iMX53 (Cortex-A8 r2p5) @ 1.0GHz	clang++-3.6.2	1m0.941s
Freescale iMX53 (Cortex-A8 r2p5) @ 1.0GHz	g++-4.9.2	0m55.673s
Samsung Exynos 5422 (Cortex-A15 r2p3) @ 2.0GHz	clang++-3.6.2	0m28.754s
Samsung Exynos 5422 (Cortex-A15 r2p3) @ 2.0GHz	g++-6.3.0	0m20.007s
Allwinner A64 (Cortex-A53 r0p4) @ 1.152GHz (alt)	clang++-3.6.2	0m40.832s
AppliedMicro X-Gene 1 @ 2.4GHz (alt)	clang++-3.5.0	0m19.623s
AppliedMicro X-Gene 1 @ 2.4GHz (alt)	g++-4.9.1	0m19.608s
Rockchip RK3368 (Cortex-A53 r0p3) A32 @ 1.51GHz (alt^2)	g++-4.9.3	0m32.623s
Rockchip RK3368 (Cortex-A53 r0p3) A64 @ 1.51GHz (alt^2)	clang++-3.6.0	0m30.224s
Rockchip RK3399 (Cortex-A72 r0p2) A64 @ 1.8GHz (alt^2)	g++-7.4.0	0m16.394s ³
MediaTek MT8163A (Cortex-A53 r0p3) A32 @ 1.5GHz (alt^2)	g++-4.9.2	0m33.430s ⁴
MediaTek MT8163A (Cortex-A53 r0p3) A64 @ 1.5GHz (alt^2)	clang++-3.6.2	0m30.587s ⁴
MediaTek MT8173C (Cortex-A72 r0p0) A64 @ 2.1GHz	clang++-5.0.1	0m19.083s ³
MediaTek MT8173C (Cortex-A72 r0p0) A64 @ 2.1GHz (alt^2)	g++-7.2.1	0m14.011s ³
MediaTek MT8173C (Cortex-A72 r0p0) A64 @ 2.1GHz (alt^2)	g++-8.3.0	0m13.433s
Marvell ARMADA 8040 (Cortex-A72 r0p1) A64 @ 1.3GHz (alt^2)	g++-7.2.1	0m22.703s ³
Marvell ARMADA 8040 (Cortex-A72 r0p1) A64 @ 1.3GHz	clang++-3.5.2	0m30.836s ³
Marvell ARMADA 8040 (Cortex-A72 r0p1) A64 @ 2.0GHz	clang++-3.5.2	0m20.035s ³
NXP LX2160A (Cortex-A72 r0p3) A64 @ 2.0GHz (alt^2)	g++-7.4.0	0m14.748s
NXP LX2160A (Cortex-A72 r0p3) A64 @ 2.0GHz (alt^2)	clang++-8.0.0	0m19.590s ³
AWS Graviton (Cortex-A72 r0p3) A64 @ 2.28GHz (alt^2)	g++-7.3.0	0m12.881s
AWS Graviton (Cortex-A72 r0p3) A64 @ 2.28GHz	clang++-6.0.0	0m16.924s
AWS Graviton2 (Cortex-A76 r3p0) A64 @ 2.5GHz (alt^2)	g++-7.5.0	0m8.115s
AWS Graviton2 (Cortex-A76 r3p0) A64 @ 2.5GHz (alt^2)	clang++-9.0.1	0m7.740s
Baikal-T1 (MIPS P5600) @ 1.2GHz (alt)	g++-6.3.0	0m37.212s
Baikal-T1 (MIPS P5600) @ 1.2GHz (alt^2)	g++-7.3.0	0m31.822s
Amlogic S922X (Cortex-A73 r0p2) A64 @ 1.8GHz (alt^2)	g++-7.3.0	0m15.459s
NVIDIA armv8.2 (Carmel) A64 @ 1.907GHz (alt^2)	g++-8.4.0	0m14.680s
NVIDIA armv8.2 (Carmel) A64 @ 2.265GHz (alt^2)	g++-8.4.0	0m10.990s
NVIDIA Orin (Cortex-A78AE) A64 @ 2.2GHz (alt^2)	g++-8.4.0	0m6.51s
Fujitsu armv8.2 (A64fx) A64 @ 2.2GHz (alt^2)	clang++-7.0.0	0m14.652s
Snapdragon 835 (Cortex-A73 r?p?) A64 @ 2.55GHz (alt^2)	clang++-9.0.0	0m12.117s
Snapdragon 835 (Cortex-A73 r?p?) A64 @ 2.55GHz (alt^2)	g++-7.5.0	0m11.208s
Snapdragon 835 (Cortex-A73 r?p?) A64 @ 2.55GHz (alt^2)	g++-8.3.0	0m10.915s
Snapdragon SQ1 (Cortex-A76 r?p?) A64 @ 3.0GHz (alt^2)	clang++-9.0.1	0m7.043s
Snapdragon SQ1 (Cortex-A76 r?p?) A64 @ 3.0GHz (alt^2)	clang++-10.0.0	0m6.690s
Apple armv8.4 (Firestorm) @ 3.2GHz (alt^2)	apple clang++ 12.0.0	0m6.510s
Apple armv8.4 (Firestorm) @ 3.2GHz (alt^2)	g++-11.0.0	0m4.350s
Apple armv8.5 (Avalanche) @ 3.49GHz (alt^2)	apple clang++ 14.0.0	0m5.880s
Apple armv8.5 (Avalanche) @ 3.49GHz	g++-12.2.0	0m4.142s
Apple armv8.5 (Avalanche) @ 3.49GHz	g++-13.1.0	0m3.930s
Intel i5-1035G4 (Ice Lake) @ 3.5 GHz (alt^2)	clang++-9.0.0	0m4.930s
AMD Ryzen 9 7950x (Zen4) @ 4.5 GHz	g++-11.2.1	0m3.90s
AMD Ryzen 9 7950x (Zen4) @ full boost (~5.7 GHz)	g++-11.2.1	0m3.11s
Intel i5-5257U (Broadwell) @ 3.1 GHz	apple clang++ 11.0.3	0m6.70s
Intel i5-5257U (Broadwell) @ 3.1 GHz (alt^2)	apple clang++ 11.0.3	0m6.00s
Intel i9-13900k (Raptor Cove) @ full boost (~5.50-5.56 GHZ)	clang-13.0 ⁵	0m2.676s
Intel i7-8700 (Coffee Lake) @ full boost (~4.28-4.60 GHZ)	clang-13.0 ⁵	0m4.551s
Intel i7-8700 (Coffee Lake) @ full boost (~4.28-4.60 GHZ) (alt^2)	clang-13.0 ⁵	0m4.089s
Intel i7-8700 (Coffee Lake) @ 3.19 GHZ	clang-13.0 ⁵	0m6.25s
Intel i7-8700 (Coffee Lake) @ 3.19 GHZ (alt^2)	clang-13.0 ⁵	0m5.62s
Intel w9-3475X (Sapphire Rapids) @ 4.80GHz (alt^2)	clang-13.0 ⁵	0m3.469s
Intel w9-3475X (Sapphire Rapids) @ 2.21GHz (alt^2)	clang-13.0 ⁵	0m6.766s

Note: There are two compiler snafus in all A64 alt-alt entries built by clang. First, the interpereter loop does not get aligned to a multiple-of-16 address, so one has to inject nops before the loop to get optimal loop alignment. Second, the code generated for the loop could be better:

400f00:       787a7aa8        ldrh    w8, [x21,x26,lsl #1]
400f04:       12000909        and     w9, w8, #0x7
400f08:       71001d3f        cmp     w9, #0x7            // since we just AND'd its source register with 7, w9 could not be larger than 7,
400f0c:       54000468        b.hi    400f98              // so this pair of ops is essentially nops in the innermost loop
...
400f98:       f94007e8        ldr     x8, [sp,#8]
400f9c:       9100075a        add     x26, x26, #0x1
400fa0:       eb08035f        cmp     x26, x8
400fa4:       54fffae3        b.cc    400f00

Normalized performance from the above as ticks = duration x CPU_GHz (lower is better):

CPU	compiler	ticks
Freescale iMX53 (Cortex-A8 r2p5)	g++-4.9.2	55.67
AppliedMicro X-Gene 1 (alt)	g++-4.9.1	47.06
Allwinner A64 (Cortex-A53 r0p4) (alt)	clang++-3.6.2	47.04
AMD C-60 (Bobcat)	g++-4.8.4	46.11
MediaTek MT8163A (Cortex-A53 r0p3) (alt^2)	clang++-3.6.2	45.88
Rockchip RK3368 (Cortex-A53 r0p3) (alt^2)	clang++-3.6.0	45.64
Intel Core2 Quad Q6600 (Kentsfield)	clang++-3.7.x	40.48
AMD A8-7600 (Steamroller)	clang++-3.5.2	40.13
Samsung Exynos 5422 (Cortex-A15 r2p3)	g++-6.3.0	40.01
Baikal-T1 (MIPS P5600) (alt^2)	g++-7.3.0	38.19
Fujitsu armv8.2 (A64fx) (alt^2)	clang++-7.0.0	32.23
Intel Core2 Duo P8600 (Penryn) (alt^2)	apple clang++-8.1.0	32.12
Intel i7-2600 (Sandy Bridge)	g++-5.3.0	31.73
Rockchip RK3399 (Cortex-A72 r0p2) (alt^2)	g++-7.4.0	29.51
Marvell ARMADA 8040 (Cortex-A72 r0p1) (alt^2)	g++-7.2.1	29.51
NXP LX2160A (Cortex-A72 r0p3) (alt^2)	g++-7.4.0	29.50
MediaTek MT8173C (Cortex-A72 r0p0) (alt^2)	g++-7.2.1	29.42
AWS Graviton (Cortex-A72 r0p3) (alt^2)	g++-7.3.0	29.37
Intel E5-2687W (Sandy Bridge)	clang++-3.6.2	29.11
MediaTek MT8173C (Cortex-A72 r0p0) (alt^2)	g++-8.3.0	28.21
Snapdragon 835 (Cortex-A73 r?p?) (alt^2)	g++-8.3.0	27.83
Amlogic S922X (Cortex-A73 r0p2) (alt^2)	g++-7.3.0	27.83
Intel E3-1270v2 (Ivy Bridge)	g++-4.8.2	27.34
AMD Ryzen 1700X (Zen)	g++-4.9.2	25.98
NVIDIA armv8.2 (Carmel) A64 (alt^2)	g++-8.4.0	24.89
Intel i7-5820K (Haswell)	g++-4.8.x	24.86
AMD Ryzen 1700 (Zen) @ 3.168GHz	g++-12.3.1	22.43
Snapdragon SQ1 (Cortex-A76 r?p?) (alt^2)	clang++-9.0.1	21.13
Apple armv8.4 (Firestorm) (alt^2)	apple clang++ 12.0.0	20.83
Intel i5-5257U (Broadwell) @ 3.1 GHz	apple clang++ 11.0.3	20.77
Apple armv8.5 (Avalanche) (alt^2)	apple clang++-14.0.0	20.52
Snapdragon SQ1 (Cortex-A76 r?p?) (alt^2)	clang++-10.0.0	20.07
AWS Graviton2 (Cortex-A76 r3p0) (alt^2)	clang++-9.0.1	19.35
Intel i5-5257U (Broadwell) @ 3.1 GHz (alt^2)	apple clang++ 11.0.3	18.60
Intel i7-8700 (Coffee Lake) @ 3.19 GHZ (alt^2)	clang-13.0 ⁵	17.93
AMD Ryzen 9 7950x (Zen4) @ 4.5 GHz	g++-11.2.1	17.55
Intel i5-1035G4 (Ice Lake) (alt^2)	clang++-9.0.0	17.26
Intel w9-3475X (Sapphire Rapids) @ 4.80GHz (alt^2)	clang-13.0 ⁵	16.65
Intel w9-3475X (Sapphire Rapids) @ 2.21GHz (alt^2)	clang-13.0 ⁵	14.95
Intel i9-13900k (Raptor Cove) @ full boost (~5.50-5.56 GHZ)	clang-13.0 ⁵	14.88
Apple armv8.5 (Avalanche)	g++-12.2.0	14.45
NVIDIA Orin (Cortex-A78AE) A64 (alt^2)	g++-8.4.0	14.32
Apple armv8.4 (Firestorm) (alt^2)	g++-11.0.0	13.92
Apple armv8.5 (Avalanche)	g++-13.1.0	13.72

Generic compiler tuning; native tuning does not pose any speed advantage ↩ ↩²
Non-native compiler tuning -march=corei7 ↩
Non-native compiler tuning -mcpu=cortex-a57 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
This MT8163A resides in a BQ M10 Ubuntu tablet, and as such is subject to the following performance detriments: ↩ ↩²
Windows build

(1) Power management causes cores to pop in and out of existence, rather than just scaling them by frequency. (2) There is an entire (albeit minimal) Android running in a lxc container on that tablet. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹

blu/brainstorm

Description

How to Build

Benchmarks

Footnotes