CUDA/GL Rasterizer

Question

CUDA/GL Rasterizer

aKn1ghtOut opened this issue a year ago · 7 comments

I'm looking into working on a GL renderer for the parquets.
I see that the Inria repository that has now been released wrote the rasterizer completely in CUDA.
Do you have any plans on working on different renderers/rasterizers?
Also, about making the inference completely Taichi field-based, how much work do you believe is left there? Any starting points to work on this?
@wanmeihuali

Answer 1 · 2023-07-13T16:14:38.000Z

Hi @aKn1ghtOut . I do notice the official implementation. One interesting thing is that their code runs faster than mine with much more points(although psnr is similar).
For my implementation, there are multiple tasks going on:

Figuring out the reason why my implementation is slower than the official one.
Developing a completely Taichi field-based inference.
Support camera pose refinement(see PR 83).
Support rigid body dynamic scene(the current implementation already consider the case, may need camera pose refinement)
And for this week I'm still focusing on the first task...

For the second task(completely Taichi field-based inference). Most of the code can be reused(some modification is required). And the steps are:

Copy all the forward kernels into a separate file
all the PyTorch tensors shall be replaced by pre-allocated taichi fields. some minimum grammar change is needed here.
replace PyTorch cumsum and sort by taichi implementation. The good news is that according to my nsys profiling result, both these two functions are not the bottleneck for inference(takes less than 1ms). So the taichi implementation does not need to be very good. Also, I've got a taichi cumsum implementation from Taichi-nerf's author Linyou. The radix sort function can also be implemented by combining cumsum with some other kernel.
Export the kernel by AOT, profiling/tuning based on platform.

Considering I'm still stuck on task one right now, it'll be great if you would like to look into task two. I can provide you the taichi cumsum implementation if you need it. Also please checkout taichi-ngp-renderer about taichi's official NGP field-based inference if you want to work on it.

Thanks!

Answer 2 · 2023-07-13T16:19:51.000Z

Sure. Great project as well. I will look into task 2

Answer 3 · 2023-07-13T16:21:34.000Z

The cumsum implementation would be very helpful. Looking deeper into Taichi for this!

Answer 4 · 2023-07-13T16:30:10.000Z

@aKn1ghtOut Please checkout this scratch for the cumsum implementation.

Answer 5 · 2023-07-16T13:15:09.000Z

So, just wanted to provide an update:
I have been reading up on Taichi.
Initially, what I set up as a goal was to try and get the whole rasterizer itself as a Taichi Kernel and use that in AOT.
Functionally, this would have meant any supported platform would've just provided the parameters just like the _custom_fwd function is called, and the rgb image would be returned.
I have found the below issues with this approach(please correct me if I am wrong):

Working with dynamic data types is a pain in taichi. For example, just feeding in the pointcloud and pointcloud_features wouldn't work without providing details of the shape of the ndarray. Thus, any platform using the module would need a preprocessor as well apart from the checkpoints/parquets.
Taichi only appears to have Vulkan backend support for AOT marked as stable currently. Reading up more on this right now.
I believe the inference code also changed a lot since I cloned and started working on the project. Honestly, great work on keeping the velocity!

I feel like my skill set might not be the best suited to tackle the Taichi aspects right now, although looking into it was pretty fun.
Do let me know if there is other stuff I could help with.

Answer 6 · 2023-07-16T19:35:05.000Z

Dynamic data types/shapes are not the key issue. Taichi provides a C++ API so for any platform, we can use ti::Ndarray to represent data. A preprocessor is needed but it is shared among all platforms.
Taichi is still in fast-paced development, and the Vulkan backend is kind of "Universal", as it can run on both Android, ios, macOS, and AMD GPU. I believe Metal and CUDA backends shall also work. These functions shall work, but Taichi is just a startup so they don't have enough manpower to fully test them I guess.
As I said I'm still figuring out the reason why my implementation is slower than the official one. So the inference code is still changing... Right now the running time of the rasterization kernel shall be reduced from 6-7ms to 1-2ms.

If you are still interested in the project and want to make some contribution,
I've listed the four tasks that are going on right now(#97 (comment)). Feel free to look into any of them. Also, there are some issues unclosed for this project, e.g. bug reports for tool scripts.

Anyway, thanks for your time!

Answer 7 · 2024-08-20T07:44:36.000Z

@wanmeihuali Hi, I am also trying to migrate 3DGS to Android platform by using taichi AOT, but I have encountered many problems, probably because I am not familiar with the taichi library;

Could you share your details about taichi cumsum implementation? thanks for your sharing!