Building issues with Openblas and Vulkan on Windows

Question

Building issues with Openblas and Vulkan on Windows

Opened this issue 12 days ago · 2 comments

Lured by the words "Optimize Encoder performance" in the Overview of release v1.7.0 I decided to try my luck and - although having 0.0 experience in C++ programming - build the v1.7.1 release with Openblas and Vulkan support from source. In doing so I met the following hiccups:

Using W64devkit and Visual Studio 2022 Community Edition under Windows 11 Home on a Surface Pro 7+ (i5-1135G7, 8GB, iGPU Iris Xe).
All results shown are from running 'main -l nl -m models/ggml-large-v3-turbo.bin audio.wav -pc -t 8 -bs 1 -bo 5', 'audio.wav' containing 312 seconds of audio.
Speed comparisons are indicative only. Because of varying thermal throttling, precise comparisons would require extensive measurements in an experimental setup.
Quality of transcription in the various configurations was nearly the same.

My first attempts were disappointing:
a) I didn't succeed in building Whisper with Openblas and
b) on average, the 'plain vanilla' build of the 1.7.1 release of Whisper ran slightly slower than the plain vanilla build of release 1.5.4.

Later on it appeared that succesfully building Whisper with Openblas (recent versions) required the Make-option GGML_OPENBLAS64=1 instead of GGML_OPENBLAS=1. However, this build, although completing without errors, did not call Openblas! (NB. The Blas-binary at https://github.com/ggerganov/whisper.cpp/actions/runs/11213279590 shows the same defect.)

To remedy this situation, I either had to download a new Makefile from https://github.com/ggerganov/whisper.cpp ('make: fix GGML_VULKAN=1 build (#2485)'), or build Whisper with Cmake/VS2022. However, the product of using the latter option ran about 50% slower than the product of the former option, so I used the faster one.
Even then, I was a bit disappointed: I was hoping for a faster transcription than transcription by the 'whisper-blas-clblast-bin-x64'-binary of the v1.5.4 release. However, the v1.5.4 binary ran about 15% faster. Main reason: in contrast to the v1.5.4 build the v1.7.1 build does not use the iGPU.

My next step was to build Whisper with Vulkan-support. To circumvent a makefile-error in the original Makefile, I had to use the Makefile mentioned above. The result: a reduction of ca. 40% in running time in comparison with the whisper-blas-blast binary of the v1.5.4 release (i.e. resulting running time ca 140-145 sec). So, after all it was a worthwile effort.

PS. I found the notes for building Llama with Openblas and Vulkan (https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md) very useful. What about including similar notes in the Whisper project?

Answer 1 · 2024-11-19T16:36:02.000Z

@ttomaat interesting that you mention it. I always felt that "whisper-blas-clblast-bin-x64" was the strongest offering in the past for windows users without an Nvidia (likely the majority).

@tamo it would be really cool if we could build Windows x64 with openBlas + vulkan in github actions so that will be available in actions when 1.7.2 final comes out. I think that might by far become the most common/popular windows configuration

and that would somewhat correspond to the old "whisper-blas-clblast-bin-x64".

I would be happy to work on this myself and make a pull request, but don't really have much knowledge around build stuff and Github Actions tbh

Answer 2 · 2024-11-20T11:53:52.000Z

In my experiments I didn't find significant differences in speed between builds with 'GGML_VULKAN=1' and 'GGML_VULKAN=1 GGML_OPENBLAS64=1'