/ruapu

Detect CPU features with single-file

Primary LanguageCMIT LicenseMIT

ruapu

GitHub License GitHub Actions Workflow Status

Detect CPU ISA features with single-file

CPU✅ x86, x86-64
✅ arm, aarch64
✅ mips
✅ powerpc
✅ s390x
✅ loongarch
✅ risc-v
✅ openrisc
#define RUAPU_IMPLEMENTATION
#include "ruapu.h"

int main()
{
    // initialize ruapu once
    ruapu_init();

    // now, tell me if this cpu has avx2
    int has_avx2 = ruapu_supports("avx2");

    // loop all supported features
    const char* const* supported = ruapu_rua();
    while (*supported)
    {
        fprintf(stderr, "%s\n", *supported);
        supported++;
    }

    return 0;
}
OS✅ Windows
✅ Linux
✅ macOS
✅ Android
✅ iOS
✅ FreeBSD
✅ NetBSD
✅ OpenBSD
Compiler✅ GCC
✅ Clang
✅ MSVC
✅ MinGW

Best practice for using ruapu.h in multiple compilation units

  1. Create one ruapu.c for your project
  2. ruapu.c is ONLY #define RUAPU_IMPLEMENTATION and #include "ruapu.h"
  3. Other sources #include "ruapu.h" but NO #define RUAPU_IMPLEMENTATION

Let's ruapu

ruapu with C

Compile ruapu test program

# GCC / MinGW
gcc main.c -o ruapu
# Clang
clang main.c -o ruapu
# MSVC
cl.exe /Fe: ruapu.exe main.c

Run ruapu in command line

./ruapu 
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 1
xop = 0
... more lines omitted ...

ruapu with Python

Compile and install ruapu library

# from pypi
pip3 install ruapu
# from source code
pip3 install ./python

Use ruapu in python

import ruapu

ruapu.supports("avx2")
# True

ruapu.supports(isa="avx2")
# True

ruapu with Rust

Compile ruapu library

# from source code
cd rust
cargo build --release

Use ruapu in Rust

extern crate ruapu;

fn main() {
    println!("supports neon: {}", ruapu::supports("neon").unwrap());
    println!("supports avx2: {}", ruapu::supports("avx2").unwrap());
    println!("rua: {:?}", ruapu::rua());
}
Github-hosted runner result (Linux)
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 1
xop = 0
avx = 1
f16c = 1
fma = 1
avx2 = 1
avx512f = 0
avx512bw = 0
avx512cd = 0
avx512dq = 0
avx512vl = 0
avx512vnni = 0
avx512bf16 = 0
avx512ifma = 0
avx512vbmi = 0
avx512vbmi2 = 0
avx512fp16 = 0
avxvnni = 0
avxvnniint8 = 0
avxifma = 0
Github-hosted runner result (macOS)
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 0
xop = 0
avx = 1
f16c = 1
fma = 1
avx2 = 1
avx512f = 0
avx512bw = 0
avx512cd = 0
avx512dq = 0
avx512vl = 0
avx512vnni = 0
avx512bf16 = 0
avx512ifma = 0
avx512vbmi = 0
avx512vbmi2 = 0
avx512fp16 = 0
avxvnni = 0
avxvnniint8 = 0
avxifma = 0
Github-hosted runner result (macOS M1)
neon = 1
vfpv4 = 1
cpuid = 0
asimdhp = 1
asimddp = 1
asimdfhm = 1
bf16 = 0
i8mm = 0
sve = 0
sve2 = 0
svebf16 = 0
svei8mm = 0
svef32mm = 0
Github-hosted runner result (Windows)
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 1
xop = 0
avx = 1
f16c = 1
fma = 1
avx2 = 1
avx512f = 0
avx512bw = 0
avx512cd = 0
avx512dq = 0
avx512vl = 0
avx512vnni = 0
avx512bf16 = 0
avx512ifma = 0
avx512vbmi = 0
avx512vbmi2 = 0
avx512fp16 = 0
avxvnni = 0
avxvnniint8 = 0
avxifma = 0
FreeBSD/NetBSD/OpenBSD VM result (x86_64)
mmx = 1
sse = 1
sse2 = 1
sse3 = 1
ssse3 = 1
sse41 = 1
sse42 = 1
sse4a = 1
xop = 0
avx = 1
f16c = 1
fma = 1
fma4 = 0
avx2 = 1
avx512f = 0
avx512bw = 0
avx512cd = 0
avx512dq = 0
avx512vl = 0
avx512vnni = 0
avx512bf16 = 0
avx512ifma = 0
avx512vbmi = 0
avx512vbmi2 = 0
avx512fp16 = 0
avxvnni = 0
avxvnniint8 = 0
avxifma = 0

Features

  • Detect CPU ISA with single-file    sse2, avx, avx512f, neon, etc.
  • Detect vendor extended ISA     apple amx, risc-v vendor ISA, etc.
  • Detect richer ISA on Windows ARM   IsProcessorFeaturePresent() returns little ISA information
  • Detect x86-avx512 on macOS correctlymacOS hides it in cpuid
  • Detect new CPU's ISA on old systemsthey are usually not exposed in auxv or MISA
  • Detect CPU hidden ISA        fma4 on zen1, ISA in hypervisor, etc.

Supported ISA  (more is comming ... :)

CPU ISA
x86 mmx sse sse2 sse3 ssse3 sse41 sse42 sse4a xop avx f16c fma fma4 avx2 avx512f avx512bw avx512cd avx512dq avx512vl avx512vnni avx512bf16 avx512ifma avx512vbmi avx512vbmi2 avx512fp16 avxvnni avxvnniint8 avxifma
arm edsp neon vfpv4 idiv
aarch64 neon vfpv4 cpuid asimdrdm asimdhp asimddp asimdfhm bf16 i8mm mte sve sve2 svebf16 svei8mm svef32mm pmull crc32 aes sha1 sha2 sha3 sha512 sm3 sm4 amx
mips msa
powerpc vsx
s390x zvector
loongarch lsx lasx
risc-v i m a f d c zfa zfh zfhmin zicsr zifencei zmmul
openrisc orbis32 orbis64 orfpx32 orfpx64 orvdx64

Techniques inside ruapu

ruapu is implemented in C language to ensure the widest possible portability.

ruapu determines whether the CPU supports certain instruction sets by trying to execute instructions and detecting whether an Illegal Instruction exception occurs. ruapu does not rely on the cpuid instructions and registers related to the CPU architecture, nor does it rely on the MISA information and system calls of the operating system. This can help us get more detailed CPU ISA information.

FAQ

Why is the project named ruapu

 ruapu is the abbreviation of rua-cpu, which means using various extended instructions to harass and amuse the CPU (rua!). Based on whether the CPU reacts violently (throws an illegal instruction exception), it is inferred whether the CPU supports a certain extended instruction set.

Why is ruapu API designed like this

 We consider gcc builtin functions to be good practice, saying __builtin_cpu_init() and __builtin_cpu_supports(). ruapu refers to this design, which can be a 1:1 replacement for gcc functions, and supports more operating systems and compilers, giving it better portability.

Why does SIGILL occur when executing in debugger or simulator, such as gdb, lldb, qemu-user, sde etc.

 Because debuggers and simulators capture the signal and stop the ruapu signal handler function by default, we can continue execution at this time, or configure it specifically, such as handle SIGILL nostop in gdb. ruapu technically cannot prevent programs from stopping in debuggers and emulators

How to add detection capabilities for new instructions to ruapu

Assume that the new extended instruction set is named rua

  1. Add RUAPU_INSTCODE(rua, rua-inst-hex) // rua r0,r0 and RUAPU_ISAENTRY(rua) in ruapu.h
  2. Add PRINT_ISA_SUPPORT(rua) in main.c to print the detection result
  3. Add entries about rua in README.md
  4. Create a pull request!

https://godbolt.org/ is a good helper to view the compiled binary code of instructions.

Repos that use ruapu

  • ncnnHigh-performance neural network inference framework
  • libllmEfficient inference of large language models

Credits

License

MIT License