Release the GIL

Question

Release the GIL

Closed this issue a month ago · 3 comments

simonw commented 2 months ago

I might be missing something, but does llama-cpp-python release the Python GIL at the moment?

If it doesn't, would releasing the GIL allow multi-threaded execution of GGUF files?

Answer 1 · 2024-11-20T07:02:44.000Z

?????????????

It's llama.cpp that handle the multithreading, and if it's running on GPU, multithreading is useless

Answer 2 · 2024-11-21T17:10:42.000Z

These are language bindings for llama.cpp, the actual gguf processing is not done in python. The GIL is released when the underlying llama cpp execution is happening. Use n_threads when instantiating a model to set parallelism of llama.cpp.

Answer 3 · 2024-12-06T07:39:13.000Z

@simonw yes llama-cpp-python uses ctypes.CFUNCTYPE to biind to the llama.cpp functions and as stated in the docs there.

The returned function prototype creates functions that use the standard C calling convention. The function will release the GIL during the call.