Is it possible to return a future without acquiring gil?
Opened this issue ยท 10 comments
๐ Question
Hi, I am trying to read a file in rust through and return an awaitable in python. I am able to use the sync function and return the response back to python but when I implement it as a non blocking function, it results in slower execution due locking and unlocking of gil.
Is it possible to be done without doing that?
Here is my code snippet
#[pyfunction]
pub fn async_static_files(py: Python, file_name: String) -> PyResult<PyObject> {
pyo3_asyncio::tokio::into_coroutine(py, async move {
let contents = fs::read(file_name.clone()).await.unwrap();
let foo = String::from_utf8_lossy(&contents);
Ok(Python::with_gil(|py| {
let x = PyString::new(py, &foo);
let any: &PyAny = x.as_ref();
let any = any.to_object(py);
any.clone()
}))
})
}
It's not possible to create and resolve a future without first interjecting python, (i.e. acquiring the GIL), because python is single-threaded. This is a limitation of cpython at the moment.
Thank you @ShadowJonathan ! Got it. ๐
@ShadowJonathan , sorry I closed the issue by mistake. I was wondering if it will be possible(through a clever bypass maybe) to not acquire gil everytime and maybe acquire it only once?
What do you mean? Can you elaborate what behaviour you're thinking about?
e.g. here, we have to acquire gil everytime when we call the function below:
pyo3_asyncio::tokio::into_coroutine(py, async move {
let contents = fs::read(file_name.clone()).await.unwrap();
let foo = String::from_utf8_lossy(&contents);
Ok(Python::with_gil(|py| {
let x = PyString::new(py, &foo);
let any: &PyAny = x.as_ref();
let any = any.to_object(py);
any.clone()
}))
})
Will it be possible for us to acquire it only once globally and then share it across function calls?
No, the GIL should only be held when you want to execute Python code through PyO3. Holding it for the entire time means that other Python code can't run at all.
The GIL is designed to be locked and released really quickly. You might want to reexamine the original premise that your code is running slowly because of the GIL. It might be something else that's causing your application to slow down.
@awestlake87 , I think I may have miscommunicated by what I meant by "slowly". I meant it was performing slower than the synchronous counterpart, i.e. fs
crate.
Here are the performance stats below:
โ ~ oha -n 10000 http://localhost:5000/test_async_python
Summary:
Success rate: 1.0000
Total: 6.5560 secs
Slowest: 0.0677 secs
Fastest: 0.0088 secs
Average: 0.0327 secs
Requests/sec: 1525.3293
Total data: 2.32 MiB
Size/request: 243 B
Size/sec: 361.97 KiB
Response time histogram:
0.005 [64] |
0.011 [205] |โ
0.016 [830] |โ โ โ โ โ โ โ
0.021 [2518] |โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ
0.027 [3511] |โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ
0.032 [1756] |โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ
0.037 [631] |โ โ โ โ โ
0.043 [331] |โ โ โ
0.048 [92] |
0.054 [57] |
0.059 [5] |
Latency distribution:
10% in 0.0246 secs
25% in 0.0283 secs
50% in 0.0323 secs
75% in 0.0364 secs
90% in 0.0416 secs
95% in 0.0460 secs
99% in 0.0547 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0115 secs, 0.0042 secs, 0.0150 secs
DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0005 secs
Status code distribution:
[200] 10000 responses
โ ~ oha -n 10000 http://localhost:5000/test
Summary:
Success rate: 1.0000
Total: 4.0756 secs
Slowest: 0.0624 secs
Fastest: 0.0031 secs
Average: 0.0203 secs
Requests/sec: 2453.6563
Total data: 2.32 MiB
Size/request: 243 B
Size/sec: 582.26 KiB
Response time histogram:
0.005 [129] |
0.011 [977] |โ โ โ โ โ โ โ
0.016 [4325] |โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ
0.021 [2264] |โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ
0.026 [1185] |โ โ โ โ โ โ โ โ
0.032 [655] |โ โ โ โ
0.037 [244] |โ
0.042 [126] |
0.047 [0] |
0.053 [1] |
0.058 [94] |
Latency distribution:
10% in 0.0135 secs
25% in 0.0157 secs
50% in 0.0184 secs
75% in 0.0235 secs
90% in 0.0304 secs
95% in 0.0341 secs
99% in 0.0443 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0034 secs, 0.0030 secs, 0.0040 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 10000 responses
โ ~ oha -n 10000 http://localhost:5000/test_sync
Summary:
Success rate: 1.0000
Total: 3.6844 secs
Slowest: 0.0591 secs
Fastest: 0.0019 secs
Average: 0.0184 secs
Requests/sec: 2714.1283
Total data: 2.32 MiB
Size/request: 243 B
Size/sec: 644.08 KiB
Response time histogram:
0.005 [57] |
0.010 [2079] |โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ
0.015 [4429] |โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ โ
0.019 [495] |โ โ โ
0.024 [365] |โ โ
0.029 [1342] |โ โ โ โ โ โ โ โ โ
0.034 [713] |โ โ โ โ โ
0.039 [254] |โ
0.044 [168] |โ
0.048 [57] |
0.053 [41] |
Latency distribution:
10% in 0.0108 secs
25% in 0.0119 secs
50% in 0.0139 secs
75% in 0.0264 secs
90% in 0.0321 secs
95% in 0.0362 secs
99% in 0.0454 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0022 secs, 0.0011 secs, 0.0029 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 10000 responses
and here are the three identical python snippets calling them
@app.get("/test")
async def test():
import os
path = os.path.abspath(os.path.join(os.path.dirname(os.path.realpath(__file__)), "index.html"))
return await async_static_files(path)
@app.get("/test_sync")
async def test_sync():
import os
path = os.path.abspath(os.path.join(os.path.dirname(os.path.realpath(__file__)), "index.html"))
return static_file(path)
@app.get("/test_async_python")
async def test_async_python():
import os
path = os.path.abspath(os.path.join(os.path.dirname(os.path.realpath(__file__)), "index.html"))
return await async_static_files_python(path)
Surprisingly the test_sync
route is the fastest here. Even though the async file reading using tokio is faster than the async file reading in python but it was a bit surprising to me that tokio implementation was slower than the sync one.
Oh and this is the implementation of async_static_files_python
is
async def async_static_files_python(filename):
async with aiofiles.open(filename, mode='r') as f:
contents = await f.read()
return contents
Python async/await is not necessarily faster than sync code. Essentially, performance in Python usually boils down to how much Python it has to run and how thin the FFI layer is. Here's a more detailed explanation.
There's no magic wand for performance. Context matters a lot. Python is backed by a lot of native code already, so replacing parts of it with Rust may not make it any faster.
One thing I will say about your async_static_files
function is that it's making at least 3 copies of the data. One copy is the original file contents
, the second copy is the utf-8 decoded String foo
, and the third is the PyString x
. The reason I can tell is that all of these variables are passed by reference when the next copy is created. If you can find a way to pass them by value, then the buffer could potentially be reused instead of cloned which might make things faster.
then the buffer could potentially be reused instead of cloned which might make things faster.
Thank you for the explanation @awestlake87 . Which buffer are you talking about ?
An internal memory buffer of the object, a low-level representation of that memory, which can then be re-used efficiently in-place.