minghanqin/LangSplat

change language feature encoder to dim=4 - CUDA error: an illegal memory access was encountered

wrencanfly opened this issue · 7 comments

I tried to explore the more dimension listed in the paper(which in practice paper used language feature as dim = 3).
I tried to use dim = 4 now.

my steps:

  1. modify the LangSplat/submodules/diff-gaussian-rasterization/cuda_rasterizer/config.h, changing NUM_CHANNELS_language_feature to 4, rebuild and re-install
  2. re-train the autoencoder - modify the last layer's dim from 3 to 4
  3. generated the language_features_dim4 folder
  4. train the langsplat: python train.py -s $dataset_path -m output/${casename} --start_checkpoint $dataset_path/$casename/chkpnt30000.pth --feature_level ${level}

and here is the error I got:

Traceback (most recent call last):
  File "train.py", line 240, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 99, in training
    gt_language_feature, language_feature_mask = viewpoint_cam.get_language_feature(language_feature_dir=dataset.lf_path, feature_level=dataset.feature_level)
  File "/datadrive/yingwei/LangSplat/scene/cameras.py", line 94, in get_language_feature
    return point_feature.cuda(), mask.cuda()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

after I added CUDA_LAUNCH_BLOCKING = 1:

/Traceback (most recent call last):
  File "train.py", line 240, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 93, in training
    render_pkg = render(viewpoint_cam, gaussians, pipe, background, opt)
  File "/datadrive/yingwei/LangSplat/gaussian_renderer/__init__.py", line 113, in render
    "visibility_filter" : radii > 0,
RuntimeError: CUDA error: an illegal memory access was encountered

I tried this RP: graphdeco-inria/gaussian-splatting#41 (comment) but didn't work

I basically can conclude the issue didn't happen in my CUDA or pytorch, since I can run when I set language feature dim as 3 smoothly.

Are there any other changes I need to edit besides NUM_CHANNELS_language_feature?

The issue was solved by creating a new environment on WSL2 cuda118. I am going to explore more to reproduce this issue.

Currently I met a new issue when changing language features dimensions to 4:

RuntimeError: Function RasterizeGaussiansBackward returned an invalid qradient at index 4 - got[2228907,4] but expected shape compatible with [2228907,3]

image

Any help would be appreciated!

could you plz provide more details concerning rebuilding and reinstalling?I'm also making a try to a higher language feature dimension.

could you plz provide more details concerning rebuilding and reinstalling?I'm also making a try to a higher language feature dimension.

@Zhirui86
Hi, finally I figured out where the issue is:

if you met error like: RuntimeError: Function _RasterizeGaussiansBackward returned an invalid gradient at index 4 - got [2228907, 4] but expected shape compatible with [2228907, 3]

We need to edit following files:

  1. LangSplat/submodules/langsplat-rasterization/cuda_rasterizer/config.h
    change NUM_CHANNELS_language_feature to the desired dim. Delete the local package and rebuild it.- remember to remove LangSplat/submodules/langsplat-rasterization/build folder if you have one, otherwise, it will use the cache.

  2. LangSplat/scene/gaussian_model.py
    https://github.com/minghanqin/LangSplat/blob/main/scene/gaussian_model.py#L206C1-L206C87
    change the line:

                language_feature = torch.zeros((self._xyz.shape[0], 3), device="cuda")

We need to initialize the language_feature as the desired dim here, as for me it should be:

                language_feature = torch.zeros((self._xyz.shape[0], 4), device="cuda")

Cheers!

Have you ever tried a 512-dimension feature?I can build successfully in a lower level while it turns out as follows when setting dimension into 512:

image

Have you ever tried a 512-dimension feature?I can build successfully in a lower level while it turns out as follows when setting dimension into 512:

image

Hi! I have the similar issue. Have you resolved this problem?

Hello, did you solve the error in brackets by just creating a new environment?(RuntimeError: CUDA error: an illegal memory access was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.)

@wrencanfly could you provide your environment details?