TokisanGames/Terrain3D

C# Garbage collection clears storage.region_offsets

TheOrioli opened this issue · 8 comments

Terrain3D version

v0.9.2-dev

System information

Godot v4.2.2.stable.mono - Windows 10.0.19045 - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 2070 SUPER (NVIDIA; 31.0.15.5212) - AMD Ryzen 9 3900X 12-Core Processor (24 Threads)

Is the issue reproducable in the demo?

Not applicable

Issue description

I use Terrain3D with C# primarily. The previous version in my project was 0.9.0-beta
I keep a reference to the Terrain3DStorage resource in my scripts, and destroying levels and clearing references worked nicely.

In version 0.9.2-dev built off f626dc1 even accessing the storage property through C# causes the garbage collector to completely clear all the terrain data down the line.

I am writing this bug as a reference for now, currently I don't have the time to make a MRP, but I plan to get on that in a few days.

Specifically, the visuals seems fine, but physics collisions no longer get generated when loading the same storage again.

MRP

See #393 (comment)

0.9.0 and 0.9.2 are a world apart. Please test 0.9.1, or preferably bisect between the two and you can narrow down the exact commit that caused the problem. I don't use C#, so you'll need to do the heavy lifting. I can support the back end.

Do you know how to use git bisect? It's very easy and more useful than an MRP.

I assume you're using the same version of Godot, and hopefully godot-cpp, since those are what C# interfaces with.

@TokisanGames thanks for your patience while I dug into this.

The commit causing the issue is unfortunately the big one 5835e41

The MRP (too large to upload here) contains a git branch called terrain_5835e41 with the plugin set to that commit
https://github.com/TheOrioli/Terrain3DGCBug/tree/terrain_5835e41
and a branch set to 0.9.1 which works as expected
https://github.com/TheOrioli/Terrain3DGCBug/tree/terrain_0.9.1
Admin edit: See #393 (comment)

To try the MRP run the project, and use the SPAWN, REMOVE and GC(Garbage Collection) buttons in that order. After pressing the GC button once press SPAWN again and the terrain will appear but none of the balls will be visible nor will the debug collision be visible. You can also notice errors pile up in the Debugger dock.

image
image

Here is an interesting part of the puzzle, if you press the GC button multiple times after removing the terrain from the scene, you will notice that more objects get deleted.

After pressing REMOVE

Terrain3D#2234:_notification: NOTIFICATION_PREDELETE
Terrain3D#2234:_clear: Clearing the terrain
Terrain3D#2234:_destroy_collision: Freeing debug static body
Terrain3D#2234:_destroy_collision: Freeing dsb child 0 CollisionShape3D
Terrain3D#2234:_destroy_collision: Freeing static body
Terrain3D#2234:_notification: NOTIFICATION_EXIT_TREE
Terrain3D#2234:_clear: Clearing the terrain
Terrain3D#2234:_destroy_mouse_picking: Freeing mouse_quad
Terrain3D#2234:_destroy_mouse_picking: Freeing mouse_cam
Terrain3D#2234:_destroy_mouse_picking: memdelete mouse_vp
Terrain3D#2234:_notification: NOTIFICATION_EXIT_WORLD

After pressing GC once

Terrain3DMaterial#4808:~Terrain3DMaterial: Destroying material
Terrain3DGenTex:clear: GeneratedTexture freeing RID(9878424780878)
Terrain3DGenTex:clear: GeneratedTexture unref image<Image#-9223371995817704131>

After pressing GC twice

Terrain3DStorage#4919:_clear: Clearing storage
Terrain3DGenTex:clear: GeneratedTexture freeing RID(9702331121738)
Terrain3DGenTex:clear: GeneratedTexture freeing RID(9710921056331)
Terrain3DGenTex:clear: GeneratedTexture freeing RID(9719510990924)

After pressing GC twice, pressing SPAWN again will work without issue.
The hardest problems are always naming variables and cache invalidation, and this makes me think some kind of cache is not aware it needs to be rebuilt after the first GC run. Also in the logs this hints at collision creation being skipped:

Terrain3D#6707:set_show_debug_collision: Setting show collision: true
Terrain3D#6707:_update_collision: Collision creation time: 63 ms
.
.
.
Terrain3D#8168:set_show_debug_collision: Setting show collision: true
Terrain3D#8168:_update_collision: Collision creation time: 0 ms <---- 🤔

Let me know what you think. I am unfortunately not super involved with the deeper way of how C# interacts with Godot internals, but worst comes to worst perhaps we can summon one of the C# maintainers here to shed some light.

First, storage is going away as a resource in PR #374, in lieu of a directory of resources.

Second, dynamic collision will fix a lot of issues in PR #278.

I ran your demos. First thing I noticed looking at the storage resource in the inspector: in 0.9.1 you have 4 region offsets, heightmaps, control maps, colormaps. In the other version you have 2 region offsets, 2 heightmaps, 5 control maps, 5 colormaps. This storage file is messed up. I replaced the broken storage.res with the fixed one.

Running the recent version with new storage, after the 2nd spawn it has nothing to do with collision. After GC Storage thinks that there are no regions, even though I can see the 4 region offsets in the inspector at runtime.

Attaching this to terrain3d:

extends Terrain3D

func _ready() -> void:
	print("GD Region count: ", storage.get_region_count())
	print("GD Region offsets: ", storage.region_offsets)

After spawn, reports:
GD Region count: 4
GD Region offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]

After gc/spawn, reports:
GD Region count: 0
GD Region offsets: []

I suspect there's a bug in Godot's garbage collection. I should not have to do anything in C++ to accommodate for C#'s garbage collection. The commit you found only has a few changes to terrain3d and storage, so finding the exact line that triggers the engine bug won't be difficult. Ultimately, we'll have to take that to the C# guys to learn what they are doing.

I spent a bit of time exploring this, first breaking up the commit you found, and then in the main branch. The change that caused the broken respawning is calling _update_regions() in terrain_3d_material.cpp:initialize(). You can comment that out and it will respawn without errors, but the terrain is flat.

_update_regions();

The function passes data to the shader so its not flat. But by calling it there, it causes your C# code to instantiate a terrain where storage.region_offsets has been cleared. _update_regions is the only function in the class that references storage.region_offsets and only reads from it.

The storage instance is the same. The region_offsets array pointer is the same, but its contents have been flushed between the end of GC and the new spawn. There isn't any code in any class that clears that array. Storage doesn't generate the contents. It is populated by add_region(), through editor or API insertion, which is saved in the resource. When storage is loaded it loads region_offsets from the file.

So I believe the GC problem always existed but it wasn't visible because of a different structure and the new structure exposes it.

In C# I retrieved region_offsets to a local variable, then printed the contents at the beginning and end of spawn/remove/gc. It's still full after the GC call (which may be deferred). It's empty on the beginning of the next spawn call, before the terrain has been instantiated again, which confirms the deferred theory. So GC is flushing that vector2i array, but not any of the other image arrays.

If I duplicate the region_offsets array, and reassign it to storage it works around the GC issue.

using Godot;
using System;
using System.Collections.Generic;

public partial class Main : Node3D
{
	[Export(PropertyHint.File)]
	private string terrainScenePath;
	private readonly List<Node3D> meshes = new();

	private Node terrain;
	// private Resource storage;
	private Godot.Collections.Array region_offsets = new();

	public void SpawnTerrain() {
		GD.Print("**- Spawn start: local region_offsets: ", region_offsets, " **-");
		
		terrain = ResourceLoader.Load<PackedScene>(terrainScenePath, cacheMode: ResourceLoader.CacheMode.Ignore).Instantiate();
		var storage = terrain.Get("storage").As<Resource>();
		GD.Print("*** storage.region_offsets: ", storage.Call("get_region_offsets").AsGodotArray(), " ***");

		if(region_offsets.Count > 0 && storage.Call("get_region_offsets").AsGodotArray().Count == 0) {
			GD.Print("--- storage.region_offsets is empty, assigning backup with count: ", region_offsets.Count);
			storage.Call("set_region_offsets", region_offsets);
			GD.Print("--- storage.region_offsets: ", storage.Call("get_region_offsets").AsGodotArray());
		}
		region_offsets = storage.Call("get_region_offsets").AsGodotArray().Duplicate();
		
		AddChild(terrain);
		var cam = GetViewport().GetCamera3D();

		for (int i = 0; i < 25; i++) {
			var pos = cam.GlobalPosition + new Vector3(GD.Randf() * 25f, 0f, GD.Randf() * 25f) + -cam.GlobalBasis.Z * (float)GD.RandRange(10f, 25f);
			pos.Y = (float)(storage.Call("get_height", pos).AsDouble() + 1.0);
			var mi = new MeshInstance3D {
				Mesh = new SphereMesh { Radius = 1, Height = 2 },
				Position = pos,
			};

			AddChild(mi);
			meshes.Add(mi);
		}
		GD.Print("--- Spawn end: local region_offsets: ", region_offsets);
	}

	public void RemoveTerrain() {
		GD.Print("--- Remove start: local region_offsets: ", region_offsets);
		foreach (var mi in meshes) {
			mi.QueueFree();
		}

		meshes.Clear();

		terrain.QueueFree();
		terrain = null;
		// storage = null;
		GD.Print("--- Remove end: local region_offsets: ", region_offsets);
	}

	public void ForceGC() {
		GD.Print("--- GC start: local region_offsets: ", region_offsets);
		GC.Collect();
		GD.Print("--- GC end: local region_offsets: ", region_offsets);
	}
}

With Duplicate()

**- Spawn start: local region_offsets: [] **-
*** storage.region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)] ***
--- Spawn end: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- Remove start: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- Remove end: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- GC start: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- GC end: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
**- Spawn start: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)] **-
*** storage.region_offsets: [] ***
--- storage.region_offsets is empty, assigning backup with count: 4
--- storage.region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- Spawn end: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]

Without Duplicate()

**- Spawn start: local region_offsets: [] **-
*** storage.region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)] ***
--- Spawn end: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- Remove start: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- Remove end: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- GC start: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
--- GC end: local region_offsets: [(-1, 0), (-1, -1), (0, -1), (0, 0)]
**- Spawn start: local region_offsets: [] **-
*** storage.region_offsets: [] ***
--- Spawn end: local region_offsets: []

I can confirm that pressing GC twice also addresses the issue.

So now the questions are why does GC clear the region_offsets array, or if it is instantiating a new storage (which I don't see that in the logs), why does it prevent storage from loading the array from disk? And why does collecting garbage twice properly reset it? I'm sure there's a GC bug here.

Edit: As you noted, I also saw the first GC click clears our material resource:

Terrain3DMaterial#6521:~Terrain3DMaterial: Destroying material
Terrain3DGenTex:clear: GeneratedTexture freeing RID(9637906612302)
Terrain3DGenTex:clear: GeneratedTexture unref image<Image#-9223372002008496859>

And the second click clears the storage resource:

Terrain3DStorage#6635:_clear: Clearing storage
Terrain3DGenTex:clear: GeneratedTexture freeing RID(9470402887754)
Terrain3DGenTex:clear: GeneratedTexture freeing RID(9478992822347)
Terrain3DGenTex:clear: GeneratedTexture freeing RID(9487582756940)
Terrain3DStorage#6635:set_multimeshes: Loading multimeshes: {  }

So it appears the basic issue is that the GC is not doing a complete job.

Updated MRP Instructions. Tested with 4.2.2-stable and 4.3-beta2.

The MRP provides three buttons which do the following:

  1. Spawn: Load and instantiate a packed scene w/ Terrain3D and the storage resource with the array. This loads just fine.
  2. Remove: terrain.QueueFree().
  3. GC: GC.Collect().

If we click all three, then return to step 1, the array is empty, which causes other problems.
However, if step 3 GC is done twice, then step 1 loads up properly. Looking through our logs, the first GC frees our material resource only. The second GC frees our storage resource only, apparently allowing it to be reloaded properly.

From dotnet chat. You can join us there.

raulsntos 1:26 PM
I'm trying to understand the issue but I'm not an user of Terrain3D, so I'm not sure what's going on or what's the 
expected behavior.

GC.Collect doesn't guarantee that everything will be collected. Also, I would not recommend using this in production 
code. I'm guessing you are using GC.Collect only for demonstrative purposes here.

If you need to ensure something is freed in C#, you'd normally use the Dispose method or an using statement.

Also you mention in the issue that the code that causes the issue is _update_regions. As I'm not familiar with 
Terrain3D, can you explain what this method does? And, since this is a C++ method, it's possible that there's a bug 
in godot-cpp.

Is this reproducible in a project without Terrain3D or any other GDExtension?

Please review and test the suggestions of using Dispose or using instead of GC. I'm not familiar with C#.

My response.

Thanks for the reponse. You can ignore the comments about _update_regions. That doesn't cause the problem. It only 
passes read-only data to a shader. We recently restructured a bunch of code, which revealed the GC problem for 
C# users.

The problem is that GC doesn't free a resource the first time, so when it is reinstantiated, instead of Godot 
ResourceLoader loading it from disk again, it has corrupted data (a blank array. It takes 2 GC calls to free the 
resource properly so it can be used again.

I don't use C# so have passed the suggestion to not use GC and instead use Dispose/using to the user to test.
So am I to understand that GC is not meant to be manually used?

Another note from dotnet

paulloz 7:41 PM
Cases where you'd actually want to call on the GC manually are very far and few between, yeah.

So, I'm inclined to believe that this isn't an engine bug, it's by design. It's not our bug either. And the solution is to use the other functions suggested by the devs above.

@TheOrioli Can you confirm their suggestions help your scenario? Whatever method is used, you want to ensure storage is flushed properly before it is reinstantiated. Or region_offsets is retained and replaced if not.

I'm going to close this since 2 Godot contributor chat devs said it's by design and gave a different solution that is more proper for Godot C#. This isn't a Terrain3D issue, it was only revealed by our restructuring. It can be followed up on by Op's Godot engine report, though they'll likely say the same thing. We can reopen this if that issue reveals a change we need to make in the future.
godotengine/godot#93459