cmp-nct/ggllm.cpp

CUDA mul_mat using cuBLAS for 3d multiplication fails on lm_head only for Falcon 7B

Closed this issue · 2 comments

Using default or -b > 1 currently fails on my test with all 7B variants when having cuBLAS active (ngl > 0).

That error must have been introduced within the last 48h

40B works with -b 1 and -b > 1
7B works with -b 1 only

Will be fixed tomorrow (if anyone does it quicker, even better)

Issued a hotfix for now disabling lm_head offload in 7B
cuBLAS matmul fails on lm_head for falcon_7b only. Well possible this always was the case, just that we did not offload that layer from start on.
Needs to be researched into why the cuda multiplication fails in one singular case - makes no sense on first glance

Fixed in latest commit