The Neural Engine — what do we know about it?
Most new iPhones and iPads have a Neural Engine, a special processor that makes machine learning models really fast, but not much is publicly known about how this processor actually works.
The Apple Neural Engine (or ANE) is a type of NPU, which stands for Neural Processing Unit. It's like a GPU, but instead of accelerating graphics an NPU accelerates neural network operations such as convolutions and matrix multiplies.
The ANE isn't the only NPU out there — many companies besides Apple are developing their own AI accelerator chips. Besides the Neural Engine, the most famous NPU is Google's TPU (or Tensor Processing Unit).
Why this document?
When I was still providing ML consulting services for iOS, I would often get email from people who are confused why their model doesn't appear to be running on the Neural Engine, or why it is so slow when the ANE is supposed to be way faster than the GPU...
It turns out that not every Core ML model can make full use of the ANE. The reason why can be complicated, hence this document tries to answer the most common questions.
The ANE is great for making ML models run really fast on iPhones and iPads. A model that is optimized for the ANE will seriously outperform the CPU and GPU. But the ANE also has limitations. Unfortunately Apple isn't giving third-party developers any guidance on how to optimize their models to take advantage of the ANE. It's mostly a process of trial-and-error to figure out what works and what doesn't.
Note: Everything here was obtained by experimentation. I do not work at Apple and never have, so I am not privy to any implementation details of this chip. Some of this information is probably wrong. It's definitely incomplete. If you know something that isn't explained here, or if you find information that is wrong or missing, please file an issue or make a pull request. Thanks!
I was originally planning to make this a blog post but decided to put it on GitHub to make it a community resource and so that other people could contribute to it too. Please do!
Table of contents
- Which devices have an ANE?
- Why should I care about the ANE?
- How do I make my model run on the ANE?
- How do I prevent my model from running on the ANE?
- Is my model using the ANE?
- Can I program the ANE directly?
- Isn't the ANE the same as the GPU?
- Is the ANE 16-bit?
- Which Core ML layers are not supported by the ANE?
- Use os_log to look at warning / error messages
- How to replace unsupported layers
- How does the ANE work internally?
- Other weird issues
- Reverse engineering the ANE