godotengine/godot

[Godot 3.x] Nativescript object lifecycle might not be threadsafe

Opened this issue · 4 comments

Godot version

3.x fe2f24c

System information

OS: Arch Linux, CPU: Ryzen 5, GUP: RX 480

Issue description

I occasionally get crashes infinite loops in the set holding the binding data for Nativescript here:
https://github.com/godotengine/godot/blob/3.x/modules/gdnative/nativescript/nativescript.cpp#L1388
The infinite loop outputs this error here:
https://github.com/godotengine/godot/blob/3.x/core/set.h#L130
I do create and destroy GDNative objects in multiple threads.
I suspect that function and the allocation function as well should be protected by a mutex (as this comment at the call site suggests):
https://github.com/godotengine/godot/blob/3.x/core/object.cpp#L1890

Steps to reproduce

Unfortunately, I've been unable to reproduce the issue outside of my project, even when allocating/deallocating many GDNative objects in many threads.
I suspect the problem is there, but something in my project makes it happen more often.
I'm trying to make sure these objects are always created and destroyed in the main thread. I'll post an update if I find this fixes the issue.

Minimal reproduction project

No response

So, I made some attempts to work around the issue. Specifically, I tried to make sure all Nativescripts objects are created and destroyed by the main thread. However, this is not enough because the set containing binding data is lazily populated.
The workaround isn't practical because I need to call those objects from other threads for performance reasons.
I'll try to add a mutex around that set and see if this fixes the issue. If it does, I'll create a PR. I guess this is only relevant to Godot 3.x?

I believe I have a fix for the issue, but it needs to be reviewed and tested with multiple native script language bindings.
For performance reasons, I changed the signature of this function here. This will break modules that use the script_language class directly.
If we don't want that, I can either:

  • Store a reverse lookup map from binding data to objects for Nativescript.
  • Move that new parameter to be the last and default it to nullptr

If we agree on this, I will create a PR.

I tested the change with my tests. I still get some crashes 1 out 5 runs or so, but it might be unrelated.
Before the change, tests were crashing almost at each run.

Found another thread safety issue here. Apparently, StringName variables are not thread-safe (although the data they encapsulate is).
I could add a mutex, but that variable doesn't really make sense in an MT environment as it might be changed before the method ends by another thread, and the reported message could even be wrong.
I think there are three options from easy to hard:

  • Remove it.
  • Only write that variable if the call is on the main thread.
  • Use a set of variables, one for each thread that calls the method. I'm not sure what to write in case of a crash, though.