/nvidia-arm-hpc-devkit-users-guide

Get started with your NVIDIA Arm HPC Developers Kit!

OtherNOASSERTION

Getting started with HPC on Arm64

This guide includes how-to guides, sample code, recommendations, and technical best practices to help new users get started with Arm-based systems like the NVIDIA Arm HPC Developer Kit. While it is intended for the users and administrators of NVIDIA's Arm-based platforms, this guide is also generically useful for anyone running HPC applications on Arm CPUs, with or without GPUs. The focus is mostly on the CPU since Arm-hosted GPUs are just the same as GPUs hosted by any other CPUs.

Contents

Join the NVIDIA Arm Community!

Join us on Slack

The easiest way to find help and talk to the experts is to join the NVIDIA Arm HPC Slack workspace.

Introduction to Arm64 and the NVIDIA HPC Developer Kit

The NVIDIA Arm HPC Developer Kit (simply "DevKit" in this guide) is an integrated hardware and software platform for creating, evaluating, and benchmarking HPC, AI, and scientific computing applications on a heterogeneous GPU- and CPU-accelerated computing system. The kit includes an Arm CPU, dual NVIDIA A100 Tensor Core GPUs, dual NVIDIA BlueField-2 DPUs, and the NVIDIA HPC SDK suite of tools. See the product page for more information.

This validated platform provides quick and easy bring-up and a stable environment for accelerated code execution and evaluation, performance analysis, system experimentation, and system characterization.

  • Delivers a validated system for quick and easy bring-up in familiar HPC environments
  • Offers a stable hardware and software platform for development and performance analysis of accelerated HPC, AI, and scientific computing applications
  • Enables experimentation and characterization of high-performance, NVIDIA-accelerated, Arm server-based system architectures
Hardware Specification
Model GIGABYTE G242-P32, 2U server
CPU 1x Ampere Altra Q80-30 (Arm processor)
GPU 2x NVIDIA A100 GPU
Memory 512G DDR4 memory
Storage 6TB SAS/ SATA 3.5″
Network 2x NVIDIA BlueField-2 E-Series DPU: 200GbE/HDR single-port QSFP56

The DevKit CPU uses the Arm architecture. The Arm architecture powers over two hundred billion chips across practically all computing domains, so the term "Arm" is somewhat overloaded. Various communities refer to the architecture as "Arm", "ARM", "Arm64", "AArch64", "arm64", etc. You may also find the term "SBSA" used to refer to server-class Arm CPUs. For simplicity, this guide will use the term "Arm64" to refer to any CPU built on the Armv8 or Armv9 standards and implementing Arm's Server Base System Architecture (SBSA). This includes CPUs like:

This guide will call out differences between Arm64 CPUs as needed. Note that this guide is not intended for mobile and embedded Arm CPUs e.g. NVIDIA Tegra. While many of the general principles and approaches presented here will hold true for mobile and embedded Arm platforms, this guide is focused on server-class platforms.

Additional resources

License

CC BY-SA 4.0

Unless otherwise indicated, this work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Individual examples or attached source code may be under a different license. Check the related README or LICENSE files.

Acknowledgements

This guide was inspired by and borrows from the excellent AWS Graviton Getting Started Guide. The authors of this guide gratefully acknowledge the work of the AWS engineers and thank AWS for freely providing this valuable information in the public domain.

Feedback? jlinford@nvidia.com