AWS Neuron is the software development kit (SDK) designed for ML chips AWS Trainium and Inferentia: purpose built for AI workloads. At the core of the Neuron SDK is the Neuron Compiler, which takes computation graphs from frameworks like PyTorch and JAX and converts them into highly optimized machine code.
NKI is a Python-based programming environment designed for the compiler which adopts commonly used NumPy and Triton-like syntax along with tile-level semantics. NKI also interoperates with the Neuron Profiler, providing insights into performance bottlenecks and instruction latencies. It offers tensor printing support, standard error messaging, and built-in kernel simulation capabilities for efficient debugging purposes. NKI offers two types of programming interfaces: NKI Language (nki.language) and NKI Instruction Set Architecture (nki.isa), enabling bare-metal access to the chip for full control.
The latest NKI documentation can be found on the AWS Documentation site, here. Documentation for NKI kernels are both inline (docstring) and available on the documentation site's kernel API reference page.
This folder contains the source code of the neuronxcc.nki.kernels
, and they are optimized kernels from the Neuron Team serving as samples.
All kernels located in this folder have numeric accuracy tests and performance benchmarks defined in the test directory. We also demonstrate using these kernels end-to-end in our integration tests.
Note that these kernels are already being deployed as part of the Neuron stack. With flash attention as an example,
compiling Llama models with transformers-neuronx
will automatically invoke the flash_fwd
kernel in attention.py. Therefore, replacing the framework operators with these NKI kernels likely won't result in extra performance benefit.
The tutorial kernels are for educational purpose and include the kernels that are used in NKI guides. You can clone these sample kernels and run them directly while reading through the NKI documentation. These kernels are not necessarily high-performance, but contain detailed inline comments and have accompanying documentation.
The unit tests directory contains unit tests and micro-benchmarks for standalone kernels. They run across multiple possible configurations, verify the numeric accuracy of the operation, and publish performance results to the micro-benchmark results.
The integration tests folder contains integration tests of (selected) kernels. They verify the numeric accuracy of the model’s output, and publish end-to-end performance results into the integration benchmarks folder.
NKI is currently released as beta while we gather feedback from our users and integrate it into the API. NKI API follow the Neuron SDK Maintenance Policy.
Have a look at the GitHub issues for this repository where you will find past issues customers have encountered with workarounds and clarifications. If you cannot find a suitable issue for your use-case feel free to file an issue to ask for assistance or to suggest improvements. Please read CONTRIBUTING.md for detailed information on submitting issues.
We invite you to join the NKI community! If you'd like to share kernels you create with the community, we welcome your contributions to this repository via GitHub pull-requests as well as through filed issues discussing features, bug fixes, new use-cases, and API improvements. Please see CONTRIBUTING.md for more information
This repository is licensed under the terms of the MIT-0 License