abetlen/llama-cpp-python

Release the GIL

Closed this issue · 3 comments

I might be missing something, but does llama-cpp-python release the Python GIL at the moment?

If it doesn't, would releasing the GIL allow multi-threaded execution of GGUF files?

?????????????

It's llama.cpp that handle the multithreading, and if it's running on GPU, multithreading is useless

These are language bindings for llama.cpp, the actual gguf processing is not done in python. The GIL is released when the underlying llama cpp execution is happening. Use n_threads when instantiating a model to set parallelism of llama.cpp.

@simonw yes llama-cpp-python uses ctypes.CFUNCTYPE to biind to the llama.cpp functions and as stated in the docs there.

The returned function prototype creates functions that use the standard C calling convention. The function will release the GIL during the call.