nianticlabs/acezero

Errors running ace_zero.py with WSL2

Closed this issue · 12 comments

Hi, I haven't been able to get this working in WSL2 with Ubuntu 22.04.

I get the following errors when I run this command with any image folder:

python ace_zero.py "/path/to/some/images/*.jpg" result_folder

I'm following the installation instructions, and everything seems to have installed correctly so not sure if this is a WSL2 issue or something else. Thanks!

Error Log (click to expand)

INFO:__main__:Starting reconstruction of files matching data/impcanart/*.jpg.
INFO:__main__:Downloading ZoeDepth model from the main process.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
img_size [384, 512]
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Params passed to Resize transform:
        width:  512
        height:  384
        resize_target:  True
        keep_aspect_ratio:  True
        ensure_multiple_of:  32
        resize_method:  minimal
Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt
Loaded successfully
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:__main__:Depth estimation model ready to use.
INFO:__main__:Trying seeds: [0.36983939 0.28398129 0.75519018 0.17367966 0.46093806]
INFO:__main__:Processing 5 seeds in parallel.
[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
INFO:ace_trainer:Using device for training: cuda
INFO:ace_trainer:ACE feature buffer device: cuda
INFO:ace_trainer:Setting random seed to 2089
INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously.
INFO:dataset:Overwriting focal length with heuristic derived from image dimensions.
INFO:dataset:Loading RGB files from: data/impcanart/*.jpg
INFO:dataset:Overwriting dataset with single image: 132 - data/impcanart/impcanart1-right_0017.jpg
INFO:dataset:Using ZoeDepth for depth initialization.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:ace_trainer:Loaded training scan from: data/impcanart/*.jpg -- 1 images, mean: 0.00 0.00 0.00
INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt
INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size.
INFO:ace_trainer:Starting creation of the training buffer.
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
    r = call_item()
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in __call__
    return [func(*args, **kwargs)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/grade/src/acezero/ace_zero_util.py", line 236, in map_seed
    run_cmd(mapping_cmd, verbose=verbose)
  File "/home/grade/src/acezero/ace_zero_util.py", line 49, in run_cmd
    raise RuntimeError("Error running ACE0: \nCommand:\n" + " ".join(cmd_str))
RuntimeError: Error running ACE0:
Command:
./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 4 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "ace_zero.py", line 187, in <module>
    seed_reg_rates = Parallel(n_jobs=opt.seed_parallel_workers, verbose=1)(
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 1061, in __call__
    self.retrieve()
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 938, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
RuntimeError: Error running ACE0:
Command:
./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 4 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True

</details>

Hi!

Can you please try to run ACE0 with --seed_parallel_workers 1 and report back here? The actual error message might have been swallowed by the parallel processing when mapping the seed images.

Best,
Eric

Hey Eric,

The error messages aren't much different with --seed_parallel_workers 1 unfortunately.

Error Log (click to expand)

python ace_zero.py "data/impcanart/*.jpg" result_folder --seed_parallel_workers 1

INFO:__main__:Starting reconstruction of files matching data/impcanart/*.jpg.
INFO:__main__:Downloading ZoeDepth model from the main process.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
img_size [384, 512]
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Params passed to Resize transform:
        width:  512
        height:  384
        resize_target:  True
        keep_aspect_ratio:  True
        ensure_multiple_of:  32
        resize_method:  minimal
Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt
Loaded successfully
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:__main__:Depth estimation model ready to use.
INFO:__main__:Trying seeds: [0.36983939 0.28398129 0.75519018 0.17367966 0.46093806]
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
INFO:ace_zero_util:Processing seed 0: 0.36983939211266215
INFO:ace_trainer:Using device for training: cuda
INFO:ace_trainer:ACE feature buffer device: cuda
INFO:ace_trainer:Setting random seed to 2089
INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously.
INFO:dataset:Overwriting focal length with heuristic derived from image dimensions.
INFO:dataset:Loading RGB files from: data/impcanart/*.jpg
INFO:dataset:Overwriting dataset with single image: 132 - data/impcanart/impcanart1-right_0017.jpg
INFO:dataset:Using ZoeDepth for depth initialization.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:ace_trainer:Loaded training scan from: data/impcanart/*.jpg -- 1 images, mean: 0.00 0.00 0.00
INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt
INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size.
INFO:ace_trainer:Starting creation of the training buffer.
Traceback (most recent call last):
  File "ace_zero.py", line 187, in <module>
    seed_reg_rates = Parallel(n_jobs=opt.seed_parallel_workers, verbose=1)(
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 864, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 782, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in __call__
    return [func(*args, **kwargs)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/grade/src/acezero/ace_zero_util.py", line 236, in map_seed
    run_cmd(mapping_cmd, verbose=verbose)
  File "/home/grade/src/acezero/ace_zero_util.py", line 49, in run_cmd
    raise RuntimeError("Error running ACE0: \nCommand:\n" + " ".join(cmd_str))
RuntimeError: Error running ACE0:
Command:
./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 12 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True

</details>

Too bad. Let's check whether you can call the ACE mapper directly. Please run:

python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0

(This command will train a network test.pt from the first image of your dataset. The ACE0 meta-script will call something very similar to start the reconstruction.)

That gives this error:

python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0

Traceback (most recent call last):
  File "train_ace.py", line 237, in <module>
    raise ValueError("Either use_heuristic_focal_length or use_external_focal_length "
ValueError: Either use_heuristic_focal_length or use_external_focal_length or use_ace_pose_file has to be set.

Oh, sorry. This is on me. Please try again:

python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0 --use_heuristic_focal_length True

(we have to tell ACE what to do wrt intrinsics)

Also a pretty vague error with that unfortunately!

Error(click to expand)

python train_ace.py "data/impcanart/.jpg" test.pt --use_pose_seed 0 --use_heuristic_focal_length True
INFO:ace_trainer:Using device for training: cuda
INFO:ace_trainer:ACE feature buffer device: cuda
INFO:ace_trainer:Setting random seed to 2089
INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously.
INFO:dataset:Overwriting focal length with heuristic derived from image dimensions.
INFO:dataset:Loading RGB files from: data/impcanart/.jpg
INFO:dataset:Overwriting dataset with single image: 0 - data/impcanart/impcanart1-back_0001.jpg
INFO:dataset:Using ZoeDepth for depth initialization.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
img_size [384, 512]
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, *kwargs)  # type: ignore[attr-defined]
Params passed to Resize transform:
        width:  512
        height:  384
        resize_target:  True
        keep_aspect_ratio:  True
        ensure_multiple_of:  32
        resize_method:  minimal
Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt
Loaded successfully
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:ace_trainer:Loaded training scan from: data/impcanart/.jpg -- 1 images, mean: 0.00 0.00 0.00
INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt
INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size.
INFO:ace_trainer:Starting creation of the training buffer.
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/PIL/TiffImagePlugin.py:858: UserWarning: Corrupt EXIF data.  Expecting to read 2 bytes but only got 0.
  warnings.warn(str(msg))
You are running using the stub version of nvrtc
.
You are running using the stub version of nvrtc
Segmentation fault
</details>

Sorry, turned out to be an issue with my system CUDA setup. Just reinstalled some stuff and managed to get it working. Thanks for the help, and look forward to trying it out!

I met the same error as you, can you tell me the deal details please?

Hey Eric,

The error messages aren't much different with --seed_parallel_workers 1 unfortunately.

Error Log (click to expand)


python ace_zero.py "data/impcanart/*.jpg" result_folder --seed_parallel_workers 1

INFO:__main__:Starting reconstruction of files matching data/impcanart/*.jpg.
INFO:__main__:Downloading ZoeDepth model from the main process.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
img_size [384, 512]
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Params passed to Resize transform:
        width:  512
        height:  384
        resize_target:  True
        keep_aspect_ratio:  True
        ensure_multiple_of:  32
        resize_method:  minimal
Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt
Loaded successfully
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:__main__:Depth estimation model ready to use.
INFO:__main__:Trying seeds: [0.36983939 0.28398129 0.75519018 0.17367966 0.46093806]
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
INFO:ace_zero_util:Processing seed 0: 0.36983939211266215
INFO:ace_trainer:Using device for training: cuda
INFO:ace_trainer:ACE feature buffer device: cuda
INFO:ace_trainer:Setting random seed to 2089
INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously.
INFO:dataset:Overwriting focal length with heuristic derived from image dimensions.
INFO:dataset:Loading RGB files from: data/impcanart/*.jpg
INFO:dataset:Overwriting dataset with single image: 132 - data/impcanart/impcanart1-right_0017.jpg
INFO:dataset:Using ZoeDepth for depth initialization.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:ace_trainer:Loaded training scan from: data/impcanart/*.jpg -- 1 images, mean: 0.00 0.00 0.00
INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt
INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size.
INFO:ace_trainer:Starting creation of the training buffer.
Traceback (most recent call last):
  File "ace_zero.py", line 187, in <module>
    seed_reg_rates = Parallel(n_jobs=opt.seed_parallel_workers, verbose=1)(
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 864, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 782, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in __call__
    return [func(*args, **kwargs)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/grade/src/acezero/ace_zero_util.py", line 236, in map_seed
    run_cmd(mapping_cmd, verbose=verbose)
  File "/home/grade/src/acezero/ace_zero_util.py", line 49, in run_cmd
    raise RuntimeError("Error running ACE0: \nCommand:\n" + " ".join(cmd_str))
RuntimeError: Error running ACE0:
Command:
./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 12 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True

</details>

@kk6398 Hi, I couldn't tell you exactly what fixed it, but I had only previously installed cuda-toolkit-11-8 and build-essential in WSL2 for nerfstudio, gsplat etc, rather than full CUDA install.

I installed a bunch of other CUDA packages which has got it mostly working.

sudo apt-get -y install cuda-cudart-11-8 \
                      cuda-compiler-11-8 \
                      libcublas-11-8 \
                      libcufft-11-8 \
                      libcurand-11-8 \
                      libcusolver-11-8 \
                      libcusparse-11-8

@kk6398 Hi, I couldn't tell you exactly what fixed it, but I had only previously installed cuda-toolkit-11-8 and build-essential in WSL2 for nerfstudio, gsplat etc, rather than full CUDA install.

I installed a bunch of other CUDA packages which has got it mostly working.

sudo apt-get -y install cuda-cudart-11-8 \
                      cuda-compiler-11-8 \
                      libcublas-11-8 \
                      libcufft-11-8 \
                      libcurand-11-8 \
                      libcusolver-11-8 \
                      libcusparse-11-8

thanku, I try change the version of cuda from 12.1 to 11.8, and "conda env create -f environment.yml" again. And I try the "python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0 --use_heuristic_focal_length True", then met the error " RuntimeError: File test.pt cannot be opened." So, do you know where is the test.pt?

Hi @kk6398!

That error looks weird. test.pt is the output file of train_ace.py rather than something that should already exist. Can you share the full stack trace of the error you get, please?

Best,
Eric

Could it be that your user does not have permissions to write files in the execution directory?