CUDA mul_mat using cuBLAS for 3d multiplication fails on lm_head only for Falcon 7B
Closed this issue · 2 comments
cmp-nct commented
Using default or -b > 1 currently fails on my test with all 7B variants when having cuBLAS active (ngl > 0).
That error must have been introduced within the last 48h
40B works with -b 1 and -b > 1
7B works with -b 1 only
Will be fixed tomorrow (if anyone does it quicker, even better)
cmp-nct commented
Issued a hotfix for now disabling lm_head offload in 7B
cuBLAS matmul fails on lm_head for falcon_7b only. Well possible this always was the case, just that we did not offload that layer from start on.
Needs to be researched into why the cuda multiplication fails in one singular case - makes no sense on first glance
cmp-nct commented
Fixed in latest commit