/gpuinfo

nvidia device and system information reporting

Primary LanguageC++

gpuinfo - nvidia device and system information reporting

== Overview ==

gpuinfo can list attributes for nvidia devices on the
system and benchmark some performance characteristics.

Reports generated by gpuinfo look like:

    Attributes for device 0 (NVIDIA RTX A5000):
    Max threads per block:..................... 1024
    Max block dimension X:..................... 1024
    Max block dimension Y:..................... 1024
    Max block dimension Z:..................... 64
    Max grid dimension X:...................... 2147483647
    Max grid dimension Y:...................... 65535
    Max grid dimension Z:...................... 65535
    Max shared memory per block:............... 49152
    Constant memory size:...................... 65536
    Warp size:................................. 32
    Max registers per block:................... 65536
    Peak clock frequency (kHz):................ 1695000
    Device can overlap memory copy and compute: 1
    Number of multiprocessors:................. 64
    Device can map host memory:................ 1
    Device can execute kernels concurrently:... 1
    ECC support enabled:....................... 0
    Peak memory clock frequency (kHz):......... 8001000
    Global memory bus width (bits):............ 384
    L2 cache size:............................. 6291456
    Max resident threads per multiprocessor:... 1536
    Number of asynchronous engines:............ 2
    Device shares unified addressing with host: 1
    Major compute capability version number:... 8
    Minor compute capability version number:... 6
    Device supports stream priorities:......... 1
    Device supports caching globals in L1:..... 1
    Device supports caching locals in L1:...... 1
    Max shared memory per multiprocessor:...... 102400
    Max 32-bit registers per multiprocessor:... 65536
    Device can allocate managed memory:........ 1
    Max number of blocks per multiprocessor:... 16

    Memory benchmarking:
    Host sequential write:.............. 24.75Gb/s (stddev ±27us)
    Host parallel write (2 threads):.... 27.05Gb/s (stddev ±2041us)
    Host sequential read:............... 38.84Gb/s (stddev ±259us)
    Host parallel read (2 threads):..... 57.65Gb/s (stddev ±76us)
    Pinned host memory to device:....... 24.99Gb/s (stddev ±10us)
    Device to pinned host memory:....... 24.54Gb/s (stddev ±2us)
    Device read throughput:............. 642.26Gb/s (stddev ±13us)
    Device write throughput:............ 649.35Gb/s (stddev ±3us)


== Options ==

  --device N

      Device to list properties for and benchmark.

      Defaults to 0.

  --detail {0,1,2,3}

      At detail level 0, no device attributes are listed.
      Most important attributes are listed at detail level 1
      and subsequent levels include more detailed attributes.

      Defaults to 1.

  --mem_bench

      Launch memory benchmarking after the attributes are
      reported.

      Disabled by default.


== Build ==

Your system must have a usable CUDA toolkit and compatible
C++ compiler. Bazel 6.4.0 was used during development.

For accurate benchmarking, compile with optimizations
enabled:

    bazel build -c opt :gpuinfo