Async functions from different modules treated as normal coroutines
csos95 opened this issue · 4 comments
Context
I've been slowly creating some modules to use in various projects (mostly wrapping rust libraries).
Some of them are async libraries and I want to use these modules with other lua binaries (such as love2d), so I created a simple scheduler module that just contains a tokio Runtime
and LocalSet
and functions to add coroutines to the LocalSet
and run them all.
In one of my current web projects I wanted to use lua to prototype new pages, so I made a /lua
endpoint and a websocket endpoint to connect to a fennel repl.
These endpoints pass requests from and to two lua coroutines in the same lua state.
They import modules defined in a separate .so
file and use async functions from them.
Issue
I noticed that whenever I call an async function defined in a binary/library other than the one being used to run call_async
, it uses 100% cpu.
After digging through the mlua docs and code, I believe I've tracked the issue down to the fact that the address of ASYNC_POLL_PENDING
is used for the Lua::poll_pending
value rather than a constant value.
This means every binary/library has a different value for poll_pending
.
So when the Future
impl of AsyncThread
polls the coroutine, it doesn't recognize the return value as poll_pending
.
Instead, it treats it like a normal coroutine that has yielded and immediately calls the waker to wake up the task before returning Poll::Pending
.
Question
Is there any specific reason (such as possible security concerns) that the address of ASYNC_POLL_PENDING
is used for poll_pending
rather than a constant value?
If there is not, would it be possible to change it to use a constant value so that async functions defined in other binaries/libraries will be recognized and run correctly when using call_async
?
I tried implementing this (change poll_pending to use a set value instead of an address) locally this morning, but that didn't fully solve the issue.
With the change, the poll_pending
value from an async function defined in a different library is recognized, but the function never completes.
I added many print statements throughout the relevant mlua functions and Delay (from the tokio tutorial) to figure out what's wrong and it seems that the waker does not get set correctly in this scenario.
So when the Future impls for async functions defined in a different library call wake, they're doing so on the default noop waker.
For async functions defined in the same library, the waker is set and used correctly.
The WakerGuard
does set the waker, but at some point in the process, the waker is lost.
It seems like it's somewhere between calling resume_inner
on the Thread
and calling get_poll
in the lua chunk created by create_async_callback
.
I'm not very familiar with mlua internals/the lua c api, but it seems like maybe coroutines have their own lua sub-state and ExtraData
.
So I'm thinking that the waker isn't getting copied into the sub-state, but I can't figure out why that'd be the case for async functions defined in another library, but not ones defined in the same library.
After digging through the mlua docs and code, I believe I've tracked the issue down to the fact that the address of ASYNC_POLL_PENDING is used for the Lua::poll_pending value rather than a constant value.
This means every binary/library has a different value for poll_pending.
Yeah, this is by design. It's exactly how it should be.
Modules and the main app does not share internals, everything they have on Rust side lives in their own state, isolated and unshared.
This also applies not only to futures but to registered userdata objects and so on.
Is there any specific reason (such as possible security concerns) that the address of ASYNC_POLL_PENDING is used for poll_pending rather than a constant value?
Rust does not provide stable ABI, the only API we can rely on is Lua C API which is stable and consitent.
Modules and the main app can be compiled by different compiler version, they can use different libraries (such as tokio). Even if libraries are exactly the same there is no guarantee that structs layouts are not reordered by llvm optimizer just because it decided to apply some optimization for this specific code.
For async functions defined in the same library, the waker is set and used correctly.
The WakerGuard does set the waker, but at some point in the process, the waker is lost.
It seems like it's somewhere between calling resume_inner on the Thread and calling get_poll in the lua chunk created by create_async_callback.
For the same reason as I explained above. WakerGuard
is an internal entity and cannot be shared between modules/app.
I'm not very familiar with mlua internals/the lua c api, but it seems like maybe coroutines have their own lua sub-state and ExtraData.
So I'm thinking that the waker isn't getting copied into the sub-state, but I can't figure out why that'd be the case for async functions defined in another library, but not ones defined in the same library.
Each module and the main app have their own independent state, the ExtraData
object. Waker
object for polling futures lives inside that non-shared state and this is why it's visible inside own library but not externally.
The best way to integrate async functionality between loadable modules and the main app is to integrate independent event loops together using stable primitives.
The idea is to poll coroutines normally through Lua API but recognize their ASYNC_POLL_PENDING
state and pass/receive oneshot channels to share futures state. When remote future is ready you should receive a message through channel and only then continue polling coroutine.
What use as a channel is up to you, it can be pipes or tokio channels that can operate in mixed sync/async module (sync on foreign side and async on native side).
The same principle is used to integrate async rust and async c++, async Lua apps (such as openresty OR neovim) and async rust modules.
Integrating futures through FFI is not easy, you can take a look to https://docs.rs/async-ffi/latest/async_ffi/ for some other solutions outside of Lua world (just to understand scope of the problem!)