Release the GIL
Closed this issue · 3 comments
I might be missing something, but does llama-cpp-python
release the Python GIL at the moment?
If it doesn't, would releasing the GIL allow multi-threaded execution of GGUF files?
?????????????
It's llama.cpp that handle the multithreading, and if it's running on GPU, multithreading is useless
These are language bindings for llama.cpp, the actual gguf processing is not done in python. The GIL is released when the underlying llama cpp execution is happening. Use n_threads
when instantiating a model to set parallelism of llama.cpp.
@simonw yes llama-cpp-python
uses ctypes.CFUNCTYPE
to biind to the llama.cpp
functions and as stated in the docs there.
The returned function prototype creates functions that use the standard C calling convention. The function will release the GIL during the call.