gpuinfo - nvidia device and system information reporting
== Overview ==
gpuinfo can list attributes for nvidia devices on the
system and benchmark some performance characteristics.
Reports generated by gpuinfo look like:
Attributes for device 0 (NVIDIA RTX A5000):
Max threads per block:..................... 1024
Max block dimension X:..................... 1024
Max block dimension Y:..................... 1024
Max block dimension Z:..................... 64
Max grid dimension X:...................... 2147483647
Max grid dimension Y:...................... 65535
Max grid dimension Z:...................... 65535
Max shared memory per block:............... 49152
Constant memory size:...................... 65536
Warp size:................................. 32
Max registers per block:................... 65536
Peak clock frequency (kHz):................ 1695000
Device can overlap memory copy and compute: 1
Number of multiprocessors:................. 64
Device can map host memory:................ 1
Device can execute kernels concurrently:... 1
ECC support enabled:....................... 0
Peak memory clock frequency (kHz):......... 8001000
Global memory bus width (bits):............ 384
L2 cache size:............................. 6291456
Max resident threads per multiprocessor:... 1536
Number of asynchronous engines:............ 2
Device shares unified addressing with host: 1
Major compute capability version number:... 8
Minor compute capability version number:... 6
Device supports stream priorities:......... 1
Device supports caching globals in L1:..... 1
Device supports caching locals in L1:...... 1
Max shared memory per multiprocessor:...... 102400
Max 32-bit registers per multiprocessor:... 65536
Device can allocate managed memory:........ 1
Max number of blocks per multiprocessor:... 16
Memory benchmarking:
Host sequential write:.............. 24.75Gb/s (stddev ±27us)
Host parallel write (2 threads):.... 27.05Gb/s (stddev ±2041us)
Host sequential read:............... 38.84Gb/s (stddev ±259us)
Host parallel read (2 threads):..... 57.65Gb/s (stddev ±76us)
Pinned host memory to device:....... 24.99Gb/s (stddev ±10us)
Device to pinned host memory:....... 24.54Gb/s (stddev ±2us)
Device read throughput:............. 642.26Gb/s (stddev ±13us)
Device write throughput:............ 649.35Gb/s (stddev ±3us)
== Options ==
--device N
Device to list properties for and benchmark.
Defaults to 0.
--detail {0,1,2,3}
At detail level 0, no device attributes are listed.
Most important attributes are listed at detail level 1
and subsequent levels include more detailed attributes.
Defaults to 1.
--mem_bench
Launch memory benchmarking after the attributes are
reported.
Disabled by default.
== Build ==
Your system must have a usable CUDA toolkit and compatible
C++ compiler. Bazel 6.4.0 was used during development.
For accurate benchmarking, compile with optimizations
enabled:
bazel build -c opt :gpuinfo