Try to make some corrections

Question

Try to make some corrections

Closed this issue 7 months ago · 2 comments

Hi, @georgeliu233 !
Thank you very much for your nice work!
But when I launch training process, it comes to the error at gt_obs = targets['gt_obs'][..., 0, :] * ego_mask.unsqueeze(-1) (line 134 of net_utils.py). For targets['gt_obs'][..., 0, :].shape=[8, 10, 128, 128, 3], (ego_mask.unsqueeze(-1)).shape=[8, 5, 128, 128, 1], and they can't be "*". Then I change num_waypoints: {self.future_len//5} (line 102 of preprocess.py) to num_waypoints: {self.future_len//10}, and it works.
What's more, I add parser.add_argument("--use_planning", type=bool, defailt=False) in training.py for it is used in imitation_loss().
Are these solutions right?

Answer 1 · 2023-12-06T13:01:54.000Z

Hi, @georgeliu233
When I launch testing.py, it comes to the error as follow,
/home/neousys/miniconda3/envs/DIPP/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
flow_encoder 74080
swin_encoder 5401696
cuda:0
model loaded!:epoch 30
Length test: 15948
Testing....
Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
/home/neousys/miniconda3/envs/DIPP/lib/python3.8/site-packages/torch/nn/modules/transformer.py:562: UserWarning: Converting mask without torch.bool dtype to bool; this will negatively affect performance. Prefer to use a boolean mask directly. (Triggered internally at ../aten/src/ATen/native/transformers/attention.cpp:150.)
return torch._transformer_encoder_layer_fwd(
/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/core/objective.py:800: UserWarning: Attempted to update a tensor with name speed_limit, which is not associated to any variable in the objective.
warnings.warn(
Traceback (most recent call last):
File "testing.py", line 155, in
model_testing(test_data)
File "testing.py", line 82, in model_testing
xy_plan = planner.plan(planning_inputs, selected_ref, inputs['ego_state'])
File "/home/neousys/Desktop/cz/OPGP-main/planner.py", line 207, in plan
final_values, info = self.layer.forward(planning_inputs, optimizer_kwargs={'track_best_solution': True})
File "/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/theseus_layer.py", line 93, in forward
vars, info = _forward(
File "/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/theseus_layer.py", line 171, in _forward
info = optimizer.optimize(**optimizer_kwargs)
File "/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/optimizer/optimizer.py", line 51, in optimize
return self._optimize_impl(**kwargs)
File "/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/optimizer/nonlinear/nonlinear_least_squares.py", line 251, in _optimize_impl
self._optimize_loop(
File "/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/optimizer/nonlinear/nonlinear_least_squares.py", line 137, in _optimize_loop
delta = self.compute_delta(**kwargs)
File "/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/optimizer/nonlinear/gauss_newton.py", line 47, in compute_delta
return self.linear_solver.solve()
File "/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/optimizer/linear/cholmod_sparse_solver.py", line 61, in solve
return CholmodSolveFunction.apply(
File "/home/neousys/miniconda3/envs/DIPP/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/neousys/Desktop/cz/DIPP-main/theseus/theseus/optimizer/autograd/cholmod_sparse_autograd.py", line 42, in forward
cholesky_decomposition = symbolic_decomposition.cholesky_AAt(At_i, damping)
File "sksparse/cholmod.pyx", line 641, in sksparse.cholmod.Factor.cholesky_AAt
File "sksparse/cholmod.pyx", line 589, in sksparse.cholmod.Factor.cholesky_AAt_inplace
File "sksparse/cholmod.pyx", line 604, in sksparse.cholmod.Factor._cholesky_inplace
File "sksparse/cholmod.pyx", line 599, in sksparse.cholmod.Factor._cholesky_inplace
File "sksparse/cholmod.pyx", line 387, in sksparse.cholmod._error_handler
sksparse.cholmod.CholmodNotPositiveDefiniteError: ../Supernodal/t_cholmod_super_numeric.c:911: matrix not positive definite (code 1)

could you kindly help me to find the reason?

Answer 2 · 2023-12-06T13:25:13.000Z

Hi @WeiXiCZ ,
thanks a lot for your effort in the corrections! Currently the server storing sourced code is down, and I'll make these corrections once being restored.

Concerning the solver failed problem, it is partly due to the instability of Theseus in solving hinge-like cost functions(i.e. safety cost and traffic light cost). I think you may check with other type of th solvers instead of th.CholmodSparseSolver. But I think sparse solver should be more stable than the dense one, and this works for me during my testing. So maybe the other trial is to adjust the parameters in safety and traffic light cost to see if it is applicable.

Thanks again for your issue and don't hesitate to reach me if there is any other questions.

Best,