microsoft/superbenchmark

V0.7.0 Test Plan

yukirora opened this issue · 0 comments

Test Cases

single-node test

Machine Type #Node * #GPU * GPU Type PyTorch Version Accelerated Computing Toolkit Status
ND A100 v4 1 * 8 * A100 40GB SXM PyTorch 1.8 CUDA 11.1 Done
NDm A100 v4 1 * 8 * A100 80GB SXM PyTorch 1.8 CUDA 11.1 Done
Hopper 1* 8 * H100 PyTorch 1.x CUDA11.8 Done

single-node Micro-benchmark Test

  1. tensort-inference
  • Fix Transformers version to avoid Tensorrt-inference failure (#441)
  1. cublas-function/cudnn-function
  • Support list of custom config string in cudnn-functions and cublas-functions (#414)
  • Support correctness check in cublas-functions (#450, #452)
  1. mem-bw
  • Add wait time option to resolve mem-bw unstable issue (#438)

SuperBench Improvement

  • Support non-zero return code (#410, #411,#425)
  • Support log flushing to the result file during runtime (#445)
  • Update sb version to include revision hash and date (#427)

Hopper GPU and FP8 related benchmarks

  1. docker building
  • Add CUDA11.8 Docker image for Nvidia arch90 GPUs (#449)
  1. micro-benchmark
  • Support GEMM-FLOPS for Nvidia arch90 GPUs (#456)
  • Support cuBLASLt FP16 and FP8 GEMM (#451, #455)
  • Debug ome Cublas and cudnn kernels crash issue
  1. model-benchmark
  • Support FP8 in Bert model training (#446)

New in bug bash

  • [x]
  • [x]

multiple-node test

Test Table

Machine Type #Node * #GPU * GPU Type PyTorch Version Accelerated Computing Toolkit Status
NDm A100 v4 32 * 8 * A100 80GB SXM PyTorch 1.8 CUDA 11.1 Done

distributed Micro-benchmark test

  1. ib-traffic
  • Support pair-wise pattern in IB validation benchmark(#453 )
  • Support 'pattern' in 'mpi' mode to run tasks in parallel(#447)
  1. nccl-bw
  • Support topo-aware, all-pair, and K-batch pattern in 'mpi' mode(#437, #458)
  • Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark(#454)

New in bug bash

  • [x]
  • [x]