[BUG] Ollama not using GPU and not using NPU

Question

[BUG] Ollama not using GPU and not using NPU

Closed this issue a month ago · 2 comments

Describe the bug
When installing ollama from ollama.org it detects arm64, but all models including llama3.2 don't utilize the GPU/NPU. As result the laptop battery is drained within 1 hour by 45%. And performance is low.

To Reproduce
Just install ollama and run llama3.2 or phi3 or gemma2 or mistral on your command line.
Als see if the npu or GPU is used.

Expected behavior
Models running on NPU.
That was the main reason for buying this laptop.

Host configuration:

OS and version: Windows 11
Chip: x1e-78-100, 32GB

Info
I couldn't find another way to reach the Qualcomm team to report this. My colleagues have the same disappointment. We where under the impression - based on the news Qualcomm published - that ollama World Run on the npu. Running it on the CPU is not what we're interested in.

Answer 1 · 2024-11-13T16:57:02.000Z

Hi @BrickDesignerNL, as you alluded to this isn't specific to AI Hub. I have reached out to the teams at Qualcomm working with Ollama and shared your concerns. However, Llama3.2 using Ollama has yet to be released, it was announced at Snapdragon Summit 2 weeks back, and currently CPU is the expected behavior if you take a look at the press release.

From the press release:
Finally, Ollama is capable of running on the CPU of devices powered by Snapdragon X Series. Through collaboration with Qualcomm Technologies and Microsoft, Ollama plans to enable DirectML to offload inference tasks to the Qualcomm® Adreno™ GPU and Qualcomm® Hexagon™ NPU.

Answer 2 · 2024-11-13T18:06:50.000Z

@kory thank you.
https://www.qualcomm.com/developer/blog/2024/10/qualcomm-partners-with-meta-ollama-on-new-quantized-llama-3-2-models

Calling it an AI PC, CoPilot+ PC because it contains an NPU that is designed to run LLM tasks.
Then telling the model runs on the Snapdragon explicitly mentioning it AI capability, while not utilizing that capability is not what we as consumers read and understand from the marketing communication.

It can be there in the small notes, but it's far from inline with the created perception and demo's. And not in line with what therefor people expect.

Having only 2 hours of battery life when using LLM, on a machine designed to work more than a working day and is designed to with with LLM. Is far below promis. And far below what a consumer fairly can expect from the machine based on it's promotions and the shared information.

What is the expected time to get it running smoothly for all customers?