change language feature encoder to dim=4 - CUDA error: an illegal memory access was encountered
wrencanfly opened this issue · 7 comments
I tried to explore the more dimension listed in the paper(which in practice paper used language feature as dim = 3).
I tried to use dim = 4 now.
my steps:
- modify the LangSplat/submodules/diff-gaussian-rasterization/cuda_rasterizer/config.h, changing NUM_CHANNELS_language_feature to 4, rebuild and re-install
- re-train the autoencoder - modify the last layer's dim from 3 to 4
- generated the language_features_dim4 folder
- train the langsplat:
python train.py -s $dataset_path -m output/${casename} --start_checkpoint $dataset_path/$casename/chkpnt30000.pth --feature_level ${level}
and here is the error I got:
Traceback (most recent call last):
File "train.py", line 240, in <module>
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train.py", line 99, in training
gt_language_feature, language_feature_mask = viewpoint_cam.get_language_feature(language_feature_dir=dataset.lf_path, feature_level=dataset.feature_level)
File "/datadrive/yingwei/LangSplat/scene/cameras.py", line 94, in get_language_feature
return point_feature.cuda(), mask.cuda()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
after I added CUDA_LAUNCH_BLOCKING = 1:
/Traceback (most recent call last):
File "train.py", line 240, in <module>
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train.py", line 93, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background, opt)
File "/datadrive/yingwei/LangSplat/gaussian_renderer/__init__.py", line 113, in render
"visibility_filter" : radii > 0,
RuntimeError: CUDA error: an illegal memory access was encountered
I tried this RP: graphdeco-inria/gaussian-splatting#41 (comment) but didn't work
I basically can conclude the issue didn't happen in my CUDA or pytorch, since I can run when I set language feature dim as 3 smoothly.
Are there any other changes I need to edit besides NUM_CHANNELS_language_feature?
The issue was solved by creating a new environment on WSL2 cuda118. I am going to explore more to reproduce this issue.
Currently I met a new issue when changing language features dimensions to 4:
RuntimeError: Function RasterizeGaussiansBackward returned an invalid qradient at index 4 - got[2228907,4] but expected shape compatible with [2228907,3]
Any help would be appreciated!
could you plz provide more details concerning rebuilding and reinstalling?I'm also making a try to a higher language feature dimension.
could you plz provide more details concerning rebuilding and reinstalling?I'm also making a try to a higher language feature dimension.
@Zhirui86
Hi, finally I figured out where the issue is:
if you met error like: RuntimeError: Function _RasterizeGaussiansBackward returned an invalid gradient at index 4 - got [2228907, 4] but expected shape compatible with [2228907, 3]
We need to edit following files:
-
LangSplat/submodules/langsplat-rasterization/cuda_rasterizer/config.h
change NUM_CHANNELS_language_feature to the desired dim. Delete the local package and rebuild it.- remember to remove LangSplat/submodules/langsplat-rasterization/build folder if you have one, otherwise, it will use the cache. -
LangSplat/scene/gaussian_model.py
https://github.com/minghanqin/LangSplat/blob/main/scene/gaussian_model.py#L206C1-L206C87
change the line:
language_feature = torch.zeros((self._xyz.shape[0], 3), device="cuda")
We need to initialize the language_feature as the desired dim here, as for me it should be:
language_feature = torch.zeros((self._xyz.shape[0], 4), device="cuda")
Cheers!
Hello, did you solve the error in brackets by just creating a new environment?(RuntimeError: CUDA error: an illegal memory access was encountered
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.)
@wrencanfly could you provide your environment details?