gpuinfo - nvidia device and system information reporting == Overview == gpuinfo can list attributes for nvidia devices on the system and benchmark some performance characteristics. Reports generated by gpuinfo look like: Attributes for device 0 (NVIDIA RTX A5000): Max threads per block:..................... 1024 Max block dimension X:..................... 1024 Max block dimension Y:..................... 1024 Max block dimension Z:..................... 64 Max grid dimension X:...................... 2147483647 Max grid dimension Y:...................... 65535 Max grid dimension Z:...................... 65535 Max shared memory per block:............... 49152 Constant memory size:...................... 65536 Warp size:................................. 32 Max registers per block:................... 65536 Peak clock frequency (kHz):................ 1695000 Device can overlap memory copy and compute: 1 Number of multiprocessors:................. 64 Device can map host memory:................ 1 Device can execute kernels concurrently:... 1 ECC support enabled:....................... 0 Peak memory clock frequency (kHz):......... 8001000 Global memory bus width (bits):............ 384 L2 cache size:............................. 6291456 Max resident threads per multiprocessor:... 1536 Number of asynchronous engines:............ 2 Device shares unified addressing with host: 1 Major compute capability version number:... 8 Minor compute capability version number:... 6 Device supports stream priorities:......... 1 Device supports caching globals in L1:..... 1 Device supports caching locals in L1:...... 1 Max shared memory per multiprocessor:...... 102400 Max 32-bit registers per multiprocessor:... 65536 Device can allocate managed memory:........ 1 Max number of blocks per multiprocessor:... 16 Memory benchmarking: Host sequential write:.............. 24.75Gb/s (stddev ±27us) Host parallel write (2 threads):.... 27.05Gb/s (stddev ±2041us) Host sequential read:............... 38.84Gb/s (stddev ±259us) Host parallel read (2 threads):..... 57.65Gb/s (stddev ±76us) Pinned host memory to device:....... 24.99Gb/s (stddev ±10us) Device to pinned host memory:....... 24.54Gb/s (stddev ±2us) Device read throughput:............. 642.26Gb/s (stddev ±13us) Device write throughput:............ 649.35Gb/s (stddev ±3us) == Options == --device N Device to list properties for and benchmark. Defaults to 0. --detail {0,1,2,3} At detail level 0, no device attributes are listed. Most important attributes are listed at detail level 1 and subsequent levels include more detailed attributes. Defaults to 1. --mem_bench Launch memory benchmarking after the attributes are reported. Disabled by default. == Build == Your system must have a usable CUDA toolkit and compatible C++ compiler. Bazel 6.4.0 was used during development. For accurate benchmarking, compile with optimizations enabled: bazel build -c opt :gpuinfo