Threading issue with embree 2.16
Closed this issue · 19 comments
When running with the embree 2.16 a KeyNotFoundException occurs at the following line:
https://github.com/TomCrypto/Embree.NET/blob/master/Library/Embree.cs#L653
If the sample is run with a serial foreach loop the exception does not occur but the rendered image is black:
https://github.com/TomCrypto/Embree.NET/blob/master/Sample/Program.cs#L203
Using 4 RayPacket1 calls instead works as expected
Latest working version appears to be 2.3.0
https://github.com/embree/embree/releases/tag/v2.3.0
Hi there, thanks for the bug report. I can reproduce with the released Embree 2.16.4 DLL. I took some time to investigate and didn't find anything very relevant, even the serial version seems to be crashing at random for me with an AccessViolationException.
Curiously enough, the RayPacket8 version actually does work for me (even with the threaded foreach loop). Can you confirm? If so it seems to me like it could possibly be a bug in Embree: although I am quite open to the possibility of the .NET wrapper being buggy (some of this C# unsafe stuff is pretty cavalier) the 4-packet and 8-packet code is virtually identical.
I did see some weird stuff when initializing Embree in verbose mode and setting RTC.CheckLastError after each intersect call; for some reason, some rtcIntersect8 calls appear to be made even though nothing asked for the 8-ray packet traversal flags, but I am not sure if this is actually an issue or just some implementation detail.
To be clear, the threading exception is quite possibly a subtle bug in the .NET wrapper, that needs to be tracked down. However the black render with non-threaded code is much more worrying and is what I think we should focus on first.
@TomCrypto thanks so much for looking into this! I tried changing to RayPacket8 and I can confirm it worked. To be explicit, I just duplicated the 4 rays already in the var ray
array to make it length 8, added TraversalFlags.Packet8 to the scene constructor, and used scene.Intersect8. Also I changed the for loop to loop through hits.Length. Everything seems to be as expected.
I also tried with TraversalFlags.Packet16 but received an "Inconsistent traversal flags" exception. I updated the call to new Geometry() in the Model constructor to use the exact same arguments as the scene. Not sure why I didn't have to add TraversalFlags.Packet8 in my first test. This got around the traversal flags issue but now something appears to crash in native code and gives a "Sample.exe has stopped working" popup.
I'm mostly interested in just doing single packet raycasts so this isn't the biggest issue for me. However, I am thinking about putting together a pull request to bundle this project as a nuget package and publish it on nuget.org to make it easy for other projects to include. I can start by just using the older embree dll when I build the package. That said, I'm sure others might benefit from some of the performance and multi-packet functions if we can get the newer dll working.
For reference, I'm testing with Windows 10 Pro 64-bit, Intel Core i7-4770K CPU @3.5GHz 3.50
To be honest I've never tested the 16-ray packet codepath since you need those fancy Xeon processors with the AVX-512 instruction set... if I have time this weekend I'll try and get to the bottom of the (non-threaded) black render issue. Will keep you updated of my findings.
I've created a branch where I updated DLLImport method calls to avoid calling deprecated methods. I also updated to version 2.15 because that release is recent and has pre-built dlls for both 32 and 64 bit. This appears to have fixed the 4 packet intersect method but I still get occasional crashes when doing 8 packet in multi-threaded mode.
https://github.com/amenzies/Embree.NET/tree/feature/updateAPI
I've also done some work an another branch build the library as a nuget package. As part of this, I updated the projects to build as any cpu and select at runtime between 32 and 64 bit. By default the sample projects will run as 32-bit but just uncheck "prefer 32 bit" in the project preferences to run them as 64 bit processes.
https://github.com/amenzies/Embree.NET/tree/feature/any_cpu_nuget
I'm creating a pull request for the any_cpu_nuget branch since it has both sets of changes in it. Would be great to figure out why we still crash on 8 packet intersects. Perhaps its a threading issue? It might happen with 1 or 4 packet casts but just so infrequently I'm not seeing it.
To make the cases easier to run, I updated sample with some arguments to toggle different types of casts. You can now run with any or all of these optional arguments
sample.exe verbose 1 4 8.
Cheers,
-Alex
Almost forgot, I also updated the .net version to 4.6.2.
Hey Alex
Apologies for the slow reply... the pull request looks good! Regarding the *32 and *64 versions of some functions, this seems to be because the Embree API wants size_t
s, I think passing UIntPtrs instead of having two variants should work, at least I recall having done this elsewhere.
There's a small mistake in the 8-ray packet rendering code, four rays are duplicated across the eight rays, just need to calculate eight points around the pixel and use those for anti-aliasing.
As for the crash, I think I know what it is, oh god... it's the activity ("valid") mask passed to the intersection functions. I over-eagerly align it to 64 bytes because it's so small, which is fine, but... I'm only allocating 4 bytes for it instead of (4 bytes) * (packet size) :(
https://github.com/TomCrypto/Embree.NET/blob/master/Library/Native.cs#L490
And the reason we were never seeing the crash with less than 8-ray packets is because AllocHGlobal is probably always giving at least a 16-byte alignment, so the next 64-byte boundary is never farther than 48 bytes away, and we asked for 4 + 64 - 1 = 67 bytes, so we were guaranteed at least 19 bytes which is enough for 1-ray and 4-ray packets, but not for 8-ray packets. And the reason we were not seeing this crash in non-threaded mode is because we are statistically far less likely to observe it because only one aligned ray structure is being allocated instead of 1 per thread (and maybe the main thread heap just has different properties than the TPL threads' so we wouldn't see it anyway, who knows). Ah, the joys of heap corruption...
Sorry about that. I'll fix this bug immediately!
Thanks
Can you integrate 6b6d430 into your branch and check if you're getting any more crashes? I've been repeatedly rendering and it seems good on my end, so I think that's the end of that.
Oh wow, great catch! I have updated my branch and everything seems stable! Awesome! I will try using UIntPtr instead of having two variants of the method calls. Much cleaner.
As for the rays, I totally cheated and copied the rays so I would have 8 to test with. I can pick a better pattern so we aren't just duplicating them. Any preference on pattern?
I'll let you know once the changes are in in-case you want to pull the changeset into master.
Any pattern will be fine, for four rays I just did four points centered on the pixel in a rectangle, but actually any combination will work as long as it's reasonably uniformly distributed and not too far from the pixel. From https://en.wikipedia.org/wiki/Supersampling#Supersampling_patterns the 8 innermost dots in the rotated grid pattern might be interesting to try out.
Please do let me know, I'd love to merge your changes into master.
UIntPtr update works:
amenzies@907f43a
Bit on the fence about this one. I like not having multiple variants of the method but it does prohibit the use of default parameter values and requires the caller to wrap arguments in new UIntPtr()
calls. Not a big deal considering we only call these methods in 2 or 3 places. Another option would be to make a C# method of the same name that takes ints and has a default parameter and then does the casting to UIntPtr and calls the native method.
For example:
public static uint NewTriangleMesh(IntPtr scene, MeshFlags flags, int numTriangles, int numVertices, int numTimeSteps = 1)
{
return NewTriangleMesh(scene, flags, new UIntPtr((uint)numTriangles), new UIntPtr((uint)numVertices), new UIntPtr((uint)numTimeSteps);
}
Thanks for the link on Supersampling strategies. Very interesting! Just committed rotated 8 grid sampling
amenzies@b34b399
Okay I think this pull request is ready to merge into master if you approve:
#6
I also created a branch for the alternate way of calling UIntPtr methods (no strong feelings either way on this one):
amenzies/Embree.NET@feature/any_cpu_nuget...amenzies:feature/alternateUIntPtrOption
👍
To be honest, size_t
as a concept (or even just unsigned actually) is essentially absent from .NET, all arrays are indexed with signed ints, and so on, so I would prefer the method you posted above. In my experience methods that want you to pass in UIntPtrs because they wish to mimic size_t-like semantics are highly unidiomatic and pretty much force the caller to cast (as you said) because it's infeasible for them to use UIntPtrs everywhere else in their codebase. Even simple uints
sometimes need to be typecast constantly because the CLR and virtually every library out there uses exclusively ints for interop reasons.
Great. In that case, I just merged the alternateUIntPtrOption
branch into any_cpu_nuget
. I think that the pull request is now ready to go. If you agree please merge it into master:
#6
Once this is in master I would love to get the nuget package posted to nuget.org. I'm happy to build it and upload it if you like, but suspect you might like to do it since you are the project owner / author. The build steps are documented in the readme but really its just running build.bat and uploading it to nuget.org. Please let me know if you would like me to build and upload the package.
I've gone ahead and uploaded the package to nuget. Happy to swap ownership of it if you decide to upload it later but at least wanted to get a version up there:
https://www.nuget.org/packages/Embree.NET/
Merged #6
Thanks for the nuget upload, I don't really mind who owns the nuget package btw
Closing this issue.