In-memory monster group

This code defines a memory-aligned 196,883 x 196,833 matrix of complex numbers on Linux (577.6 GiB):

real = static_cast<double*>(aligned_alloc(64, 196883 * 196883 * sizeof(double)));
imag = static_cast<double*>(aligned_alloc(64, 196883 * 196883 * sizeof(double)));

The matrix represents a linear transformation associated with an element of the monster group in its 196,883-dimensional representation. This code currently generates two of these random matrices and multiplies them using an array of CPU optimizations.

Hardware requirements

OS: Ubuntu Server 22.04 LTS - Gen2, Windows 10/11/Server
Architecture: x64 (w/ optional AVX-512)
Memory: >1,734 GiB
Cores: max available

Compile and run

Linux

sudo apt update && sudo apt install g++ make -y
git clone https://github.com/MattMcL4475/MonsterGroup.git
cd MonsterGroup/MonsterGroup
make
./monster

Windows

git clone https://github.com/MattMcL4475/MonsterGroup.git
cd MonsterGroup
"C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe" MonsterGroup.sln /p:Configuration=Release
.\Release\monster.exe

Matrix multiplication optimizations

Memory Alignment
64-byte aligned memory allocations for real and imaginary arrays optimized for Intel® AVX-512.
SIMD Vectorization (with AVX-512)
Processes eight doubles at a time using AVX-512 instructions in the inner loop, including aligned data load/store and fused multiply-add (FMA) operations
Multithreading
Uses std::thread to maximize concurrency on all available hyperthreads.
Loop Unrolling
In the AVX-512 code section, the k-loop is unrolled in groups of four to maximize parallelism.
Prefetching
Includes pre-fetch hints so the CPU pre-fetches data into the cache before it's accessed.
Cache Blocking / Tiling (in non-AVX-512 code)
The non-AVX-512 code organizes loops into blocks or "tiles" to fit the working set of data into the CPU cache. The three nested loops are divided into blocks of size tileSize, which is dynamically calculated based on the cache line size of the hardware.
Conditional platform compilation
Platform-specific implementations for both Linux and Windows.

Reference

"It's got too many properties for it to be an accident. It's never been explained why it's even there" -John Horton Conway FRS (26 December 1937 – 11 April 2020)

Monster group: https://en.wikipedia.org/wiki/Monster_group

Monsterous moonshine: https://en.wikipedia.org/wiki/Monstrous_moonshine