godotengine/godot

Reparenting node in Area.body_entered causes crash

rcorre opened this issue ยท 8 comments

Godot version:

3.1.2.stable.custom_build

OS/device including version:

Linux 5.4.2-arch1-1 x86_64 GNU/Linux

Issue description:

If you reparent a Node while handling the Area.body_entered signal, godot crashes.
My use case was an object that can "grab" another object like a magnet.

Steps to reproduce:

  1. Create an Area with this script:
extends Area

func _ready():
	connect("body_entered", self, "on_body_entered")

func on_body_entered(body: Node):
	body.get_parent().remove_child(body)
	add_child(body)
  1. Move a KinematicBody into the area
Running: /usr/bin/godot --path /tmp/example --remote-debug 127.0.0.1:6007 --allow_focus_steal_pid 71021 --debug-collisions --position 448,240 res://Spatial.tscn
Godot Engine v3.1.2.stable.custom_build - https://godotengine.org
OpenGL ES 3.0 Renderer: GeForce GTX 970/PCIe/SSE2
ERROR: build: Condition ' O == __null ' is true. Continuing..:
   At: core/math/quick_hull.cpp:402.
ERROR: build: Condition ' O == __null ' is true. Continuing..:
   At: core/math/quick_hull.cpp:402.
ERROR: build: Condition ' O == __null ' is true. Continuing..:
   At: core/math/quick_hull.cpp:402.
ERROR: build: Condition ' O == __null ' is true. Continuing..:
   At: core/math/quick_hull.cpp:402.
ERROR: get_tree: Condition ' !data.tree ' is true. returned: __null
   At: scene/main/node.h:263.
handle_crash: Program crashed with signal 11

I can try to repro with debug bits if desired.

Minimal reproduction project:
example.zip

remove_child is fine, the crash occurs at add_child. You can avoid the crash with call_deferred("add_child", body).

I have stack overflow when running project in 3.2 beta 3
Zrzut ekranu z 2019-12-08 20-10-21

I have stack overflow when running project in 3.2 beta 3

Me too. The newly added body is detected again and enters a loop.

This is weird though. Using add_child inside signals should be disallowed. I had an error about this multiple times, but it doesn't seem to be triggered here,

The cause of the crash seems to be the mentioned stack overflow (#34207 (comment) , #34207 (comment)).

As mentioned above,
the signal, when triggered, creates a node inside its trigger-area, leading to another trigger etc.

Using add_child inside signals should be disallowed.

I disagree and I don't think this is a bug, because there are easy solutions for the Game Developer to fix this. One for this particular example would be a short check, if the bodys parent is the area2d

func on_body_entered(body: Node):
	if(body.get_parent() != self):
		body.get_parent().remove_child(body)
		add_child(body)

Another reason why I think this is expected behaviour is that it might be even useful to trigger this event for new created bodies.

Example (Pseudocode):

func on_unit_entered(unit: Node):
	if(unit.type == "commander"):
		add_child(unit_tank)
		add_child(unit_soldier)
		add_child(unit_soldier)
		reinforcements_received = true
	units_in_area += 1
	if(units_in_area > 10):
		trigger_event(EVENT_SPAWN_ENEMY_TROUPS);
		disconnect("body_entered", self, "on_unit_entered")

Edit: The only thing I can think of is a better error message explaining why it is crashing. a warning when using add_child inside signals.

ReparentBug.zip

Adding another test project that I believe is related, but shows some interesting differences. This was tested with v3.2.stable.mono.official (example project is pure GDScript).

  1. The crash happens on add_child() as mentioned above, but it only crashes if the remove_child() was triggered by an area_entered signal. When you run the project at first, it will remove the item using a timeout signal from a Timer node, and it will later call add_child() without error. You can then edit the script to use the area_entered signal (commenting out the timeout connection. See this section:
	## CHOOSE ONLY ONE OF THESE PICKUP METHODS
	##   - picking up by area_entered signal will crash when re-added
	##   - picking up by timeout signal will *NOT* crash when re-added
#	get_node("Character/PickupArea").connect("area_entered", self, "pickup_item")
	get_node("PickupTimer").connect("timeout", self, "pickup_item", [ get_node("AreaItem") ])
  1. Technically the removed node is not being reparented. It is being temporarily removed and added back to the same parent.
  2. Using call_deferred() does not resolve the issue. It still crashes.
  3. The crash can be triggered even when the reparented node is not added in a position where it triggers an infinite add/remove loop.

I'm getting this on 3.2.3 stable. In my case, add_child is called later in the stack inside a different function. The stack starts with a body_entered signal and using call_deferred as suggested by @rcorre doesn't work. I disagree that adding children from a signal should be disallowed as suggested by @KoBeWi , even though there's a good explanation for the issue it can still be solved within the engine. A game developer or light script user shouldn't have to worry about engine threads, an engine developer should.
@ericrybick 's solution of checking get_parent() against the new parent worked for me.

Did some testing:

  • @rcorre's MRP still triggers a stack overflow as of 3.5-beta5, latest 3.x, and latest master.
  • @danboo's MRP from #34207 (comment) crashes in 3.3-stable, but it's fixed from 3.2.3-stable onwards (tested 3.2.3-stable, 3.3-stable, 3.5-beta5). So this was an unrelated issue.

@rcorre's MRP ported to master: body_entered_overflow.zip

Just run into this, confirmed still occurring in 3.5 rc7.

From my testing, deferring the add_child() call within body_entered triggers the call stack overflow all the same, I had to conditionally skip the entire codepath by checking whether the body is already re-parented as intended to work around the issue.