mybigday/llama.rn

Implementing optimizations from layla

Vali-98 opened this issue · 2 comments

Layla is a project that also integrates llamacpp for mobile use:
https://github.com/l3utterfly/llama.cpp/tree/layla-build

After some quick testing, it does seem like Layla's fork for llamacpp runs models far faster on android than llama.rn, almost twice as fast in some cases with 7b models.

It would be wonderful in these improvements were added to llama.rn as well.

Interesting, I just did a quick look, is it enabled CLBlast?

ggerganov/llama.cpp@master...l3utterfly:llama.cpp:layla-build

/cc @l3utterfly