Unexpected Memory Usage in sf::VertexBuffer

Question

Unexpected Memory Usage in sf::VertexBuffer

EichenseerMarco opened this issue 5 months ago · 3 comments

Prerequisite Checklist

I searched for existing issues to prevent duplicates
I searched for existing discussions on the forum to prevent duplicates
I am here to report an issue and not to just ask a question or look for help (use the forum or Discord instead)

Describe your issue here

Hi everyone,

I've been experimenting with sf::VertexBuffer and noticed some unexpected memory allocations.
I'm not sure if these actually are bugs. If not, I would appreciate some insight why the code blow behaves like this.

I haven't found another issue regarding memory management in sf::VertexBuffer.
I haven't found a relevant forum discussion regarding the potential problems I've observed.
I have found some general info about OpenGL VBOs which says that an additional RAM copy is always required (on Windows at least) but I'm not sure if this is (still) applicable: https://community.khronos.org/t/vbo-memory-usage/16223

Best regards
Marco

Your Environment

OS / distro / window manager: Windows 10 22H2
SFML version: 2.6.1 (SFML-2.6.1-windows-vc17-64-bit)
Compiler / toolchain: MSVC 17.8.3 Compiler Version 19.38.33133
Special compiler / CMake flags: n/a

Steps to reproduce

When stepping through the code below with a debugger, observe the memory usage.
The points a), b), c) and d) could indicate problems with the memory management.

#include <SFML/Graphics.hpp>
#include <vector>

// RAM and VRAM usage numbers are rough estimates according to Task Manager

int main()
{
	{ // scope to control lifetime of sf::VertexBuffer

		// current RAM usage: 1 MB
		sf::VertexBuffer vb(sf::VertexBuffer::Usage::Static);
		// expected RAM usage: 1 MB
		// actual RAM usage: 24 MB

		/*
		 * a) is it expected that an empty VertexBuffer allocates a significant amount of RAM?
		 */

		{ // scope to control lifetime of std::vector

			size_t testSize = 100e6; // testSize large enough to make observing allocations easy
			std::vector<sf::Vertex> data;
			data.resize(testSize);
			// expected RAM allocation: testSize * sizeof(sf::Vertex) == 100e6 * 20 Byte == 2 GB
			// actual RAM usage: 2GB

			vb.create(data.size());
			// expected VRAM allocation:  2 GB (reason: documentaion says "[...] allocates enough graphics memory to hold vertexCount vertices")
			// actual VRAM usage: 0 GB

			/*
			 * b) is it expected to not see any RAM nor VRAM allocations when calling sf::VertexBuffer::create()?
			 */

			vb.update(data.data());
			// expected RAM & VRAM allocations: 0 GB (reason: copy to previously allocated buffer should not increase memory footprint)
			// actual RAM usage: 5 GB
			// acutal VRAM usage: 2 GB

			/*
			 * c) is it expected to see a +3 GB RAM and a +2 GB VRAM allocations when calling sf::VertexBuffer::update()?
			 *    Seeing 2 GB VRAM eventually is expected (2 GB of vertex data in VRAM) but I would have expected this after create()
			 *    Seeing 3 GB RAM is unexpected for two reasons: Seemingly no copy to RAM is performed, data size is only 2 GB
			 */

		} // vector goes out of scope, RAM usage drops to 3 GB which matches additional RAM usage from point c)

	} // vertex buffer goes out of scope, RAM usage drops to 1GB

	/*
	 * d) is it expected that 1 GB of RAM remains after cleaning up sf::VertexBuffer?
	 */

	return 0;
}

Note that the example above uses a std:::vector<sf::Vertex> because the API does not seem to support updating a sf::VertexBuffer using a sf::VertexArray.
Is updating a sf::VertexBuffer using a sf::VertexArray indeed not supported?

void usage()
{
	sf::VertexArray va;
	// fill...

	sf::VertexBuffer vb;
	vb.create(va.getVertexCount());

	// how to fill vb using va? There seems to be no .data() or equivalent to access underlying std::vector
	// vb.update(va.?);

	vb.update(&va[0]);
	// this should work because &vector[0] == vector.data() but it looks like the API does not support this use case
}

Expected behavior

I commented in the example where exactly the unexpected memory allocation can be observed.
Summary of expected behavior:

a) Constructing empty sf::VertexBuffer does not allocate RAM.
b) sf::VertexBuffer::create() allocates requested amount of VRAM (only VRAM)
c) sf::VertexBuffer::update() copies data to VRAM, does not allocate any additional RAM nor VRAM.
d) Destroying sf::VertexBuffer and sf::Vertex data used to fill it reduces RAM and VRAM footprint to initial size.

Actual behavior

I commented in the example where exactly the unexpected memory allocation can be observed.
Summary of actual behavior:

a) Constructing empty sf::VertexBuffer allocates a significant amount of RAM.
b) sf::VertexBuffer::create() does not seem to immediately allocate VRAM.
c) sf::VertexBuffer::update() allocates expected amount of VRAM but also RAM and the RAM allocation is significantly larger than data being copied.
d) Destructing sf::VertexBuffer does not reduce RAM footprint to initial size.

I'm aware that freeing / deleting / destroying objects may not (immediately) return the memory to the OS making the values from Task Manager unreliable (references below) but both containers going out of scope in the example below show an immediate reduction in RAM/VRAM usage. The "significantly larger than data" RAM allocation remaining seems strange.
Ref: https://stackoverflow.com/a/52417370
Ref: https://stackoverflow.com/q/17008180

Answer 1 · 2023-12-29T15:47:41.000Z

Is updating a sf::VertexBuffer using a sf::VertexArray indeed not supported?
I would hazard a guess that the reason this is unsupported is because a sf::VertexArray exists as its own type of drawable and is an alternative option to a sf::VertexBuffer. It can be used with window.draw() calls, as can sf::VertexBuffer. I don't think it's expected to be able to update one from the other in this case. There has been discussions in the past about removing sf::VertexArray entirely in place of a std::vector<sf::Vertex>.

I would guess that the reason the GPU memory usage doesn't increase until the update() call is because upon create SFML passes NULL as the argument for the data parameter when calling glBufferData. As noted in the docs:

If data is not NULL, the data store is initialized with data from this pointer.
If data is NULL, a data store of the specified size is still created, but its contents remain uninitialized and thus undefined.

But upon the call to update() a valid set of data is uploaded to the GPU and thus the allocation finally appears. Maybe the driver or OS treats allocated but uninitialised memory as unused until an actual use is encountered.

Regarding the memory consumption, it could be at the OS level, it coulde be graphics driver related (leaky drivers have been a regular concern of SFML users that spot increased consumption or notice reports from the CRT that a memory leak occured). I took a look using the heap profling tools available within VS2022 and noticed nothing too out of the ordinary. Below is a screenshot of the heap allocations graphed:

I took various snapshots through the code:

At the start prior to creating anything
After creating the sf::VertexBuffer
After creating the std::vector
At the end after both local scopes have been exited.

The heap profiler reports the following contents remaining at stage 4:

When analysing the char[] contents we see:

Which are similar to the reports the CRT memory leak detection finds, appearing as a leaky driver to me here.

The only thing SFML related I can definitely notice is the results for std::basic_string seem to show

which appears to be tied back to https://github.com/SFML/SFML/blob/2.6.x/src/SFML/Window/GlContext.cpp#L265

The container proxy results also tie back to the same places as std::basic_string. Likely because of the use of an STL container within that code. ~~I'm not confirming that this extensionString case is a leak, but it may be the aspect worth investigating for any interested parties.~~

Edit: Having looked into the extensionString logic it's pushed back into a static vector, which won't be destroyed until exit. This explains the std::basic_string and container_proxy results as they will be released at the point of exit. I think what you've reported is likely expected behaviour.

Answer 2 · 2023-12-29T17:22:42.000Z

You need to understand that SFML handles a lot of hidden states when dealing with objects in the graphics modules. The first such object instance will create a shared context and a bunch of other things, so it's expected to see some more RAM usage when creating the first graphics object.

Additionally, OpenGL is a queued/"async" protocol, just because SFML has made the specific call, doesn't guarantee that it's executed right the instance, it can also be executed sometime later. Hard to say whether this effect plays any role in your measurements.

b) sf::VertexBuffer::create() does not seem to immediately allocate VRAM.

As Bambo already noticed, I guess the GPU is still informed of the up coming usage, but the data isn't actually yet initialized. Maybe that's more of a performance optimization and maybe the documentation could be aligned slightly.

How did you measure VRAM and RAM usage?

Is updating a sf::VertexBuffer using a sf::VertexArray indeed not supported?

No, it's not supported, as they server different purposes. If you just want an "array" of vertices, you should use a std::vector<sf::Vertex>.

c) sf::VertexBuffer::update() allocates expected amount of VRAM but also RAM and the RAM allocation is significantly larger than data being copied.

We're just passing the pointer to OpenGL, whether the driver creates a copy or not, isn't really in our power.

Answer 3 · 2023-12-30T09:53:08.000Z

Hi,
thanks a lot for looking into this.

@Bambo-Borris
I can't seem to get such detailed info from Visual Studio's memory profiler but it looks like heap size actually does shrink at "stage 4" to expected levels. The Heap Size columns shows 17 MB but the graph still shows 1 GB. Those 17 MB are very roughly what sf::VertexBuffer allocated initially which eXpl0it3r explained are part of the hidden state.

@eXpl0it3r
Thanks for clarifying the point about SFML managing hidden state when encountering something from the graphics module for the first time.

How did you measure VRAM and RAM usage?

Mainly via Task Manager, the large test size made it easy to spot if/when memory is allocated / data is moved to the GPU.
I also tried Visual Studio's Memory and GPU profiler but I'm not very familiar with the Memory Profiler and the GPU profiler doesn't seem to support this use case.

We're just passing the pointer to OpenGL, whether the driver creates a copy or not, isn't really in our power.

That matches what I've found that you may not be able to force OpenGL to not keep a RAM copy as well.
If the posted code doesn't show any suspicious memory allocations SFML can control, I'd say this can be closed as "not an issue".

PS.:
Also thank you guys for clarifying whether updating a sf:::VertexBuffer with a sf::VertexArray is supposed to work.
I guess the idea why I think it should comes from my limited experience with Qt where its generally a good idea to use their containers (QString, QVector, ...) if possible.

Best regards & happy new year
Marco