Returning an error in a rust function causes a un-unwindable panic
cheesycod opened this issue · 5 comments
I’m trying to run mlua in a tokio spawn local through a threadpool for luau
However, errors in rust function seems to cause un-unwindable panics due to my usage of panic = abort. Removing panic = abort doesn’t solve the issue but rather kills the thread spawned instead of panicking
code can be found at https://github.com/Anti-Raid/splashtail/blob/5b3310ff582e42041218aaf4b3821f3d09b8b12b/core/rust.templating/src/lang_lua/utils.rs#L42
After debugging this issue further, I believe I have found an unsoundness issue in mlua.
The following struct if used with a Tokio LocalSet leads to undefined behaviour:
pub struct ArLuaNonSend {
/// The Lua VM
pub vm: Rc<Lua>,
/// The execution state of the Lua VM
#[allow(dead_code)] // it isnt actually dead code
pub state: Arc<ArLuaExecutionState>,
}
Notice the pub vm: Rc<Lua>,
. This is the heart of the unsafety. When using this in a tokio localset (spawn_local
), the end result will either be an un-unwindable panic or some sort of deadlock related to mlua. For example, running the following code is unsafe if you use ArLuaNonSend as written:
/// Executes a Lua script in the Lua VM
pub async fn exec(
vm: &Lua,
template: &str,
args: Box<dyn erased_serde::Serialize + Send>,
) -> Result<serde_json::Value, base_data::Error> {
let f: LuaFunction = vm
.load(template)
.eval_async()
.await?;
log::info!("exec (done creating function f)");
let args = vm
.to_value(&args)?;
let v: LuaValue = f
.call_async(args)
.await?;
let v = serde_json::to_value(v)?;
Ok(v)
}
It looks like bog-standard Rust/Lua code but this is actually unsafe and after repeated executions on my discord bot led to a deadlock.
I have found this issue to be related to the use of std::rc::Rc
. If the ArLuaNonSend struct is changed to the below code without this Rc, the function works as normal:
pub struct ArLuaNonSend {
/// The Lua VM
pub vm: Lua,
/// The execution state of the Lua VM
#[allow(dead_code)] // it isnt actually dead code
pub state: Arc<ArLuaExecutionState>,
}
While I have not tested this with other rust types such as Box
, I still believe that this is a unsoundness issue of some sort in mlua. Perhaps mlua::Lua should be wrapped in a Pin?
@cheesycod do you have a standalone example please to reproduce the issue? that I can run on my laptop
will either be an un-unwindable panic
is there any specific message from the panic? Luau relies on external unwinding for exception handling so setting panic=abort
is not supported atm.
@cheesycod do you have a standalone example please to reproduce the issue? that I can run on my laptop
will either be an un-unwindable panic
is there any specific message from the panic? Luau relies on external unwinding for exception handling so setting
panic=abort
is not supported atm.
- The issue occurs with both panic=abort and without it. Without panic=abort, I get strange undefined behaviour such as dangling threads
Not really any standalone example since it is hard to reproduce outside of my discord bot, but well if you can create a discord application, here: https://github.com/Anti-Raid/splashtail/tree/255801d2a9ff7b39824ebd5f7e858891ad8b7e7a
The telltale sign of when this issue occurs is if you see a deadlock when editting a discord message. Then, the following is seen when run through gdb:
[Thread 0x7ffff47ef640 (LWP 391675) exited]
[Thread 0x7ffff49f6640 (LWP 391674) exited]
[Thread 0x7ffff5002640 (LWP 391671) exited]
[Thread 0x7ffff4dfe640 (LWP 391672) exited]
[Thread 0x7ffff5203640 (LWP 391670) exited]
[Thread 0x7ffff540a640 (LWP 391669) exited]
[Thread 0x7ffff560e640 (LWP 391668) exited]
[Thread 0x7ffff4bfa640 (LWP 391673) exited]
[Thread 0x7ffff580f640 (LWP 391667) exited]
[Thread 0x7ffff5a13640 (LWP 391666) exited]
@cheesycod could you try the latest v0.10.0-beta.1
I just published? It Send + Sync
when enable send
feature flag and many tricks no longer needed.
I'll try to reproduce the issue in v0.9 version in the meantime.
@khvzak this issue isn’t so relevant in 0.10.0 thanks to your lovely change of making it sync (chefs kiss for that) so I no longer have to spawn 100 threads with a custom threadpool implementation and tokio local set
Will test it locally with threadpool and local set once I’m free though