ethz-asl/aslam_optimizer

Aslam Exception

Closed this issue · 8 comments

Hi everyone,

I try to launch this python script

rosrun hand_eye_calibration_batch_estimation batch_estimator --v 1 --pose1_csv csv_file/tf_poses_timestamped.csv --pose2_csv csv_file/camera_poses_timestamped.csv --init_guess_file csv_file/calibration.json --output_file csv_file/calibration_optimized.json

But got this error :

terminate called after throwing an instance of 'aslam::Exception'
  what():  [Exception] /home/bwaki/catkin_ws/src/aslam_optimizer/aslam_backend/include/aslam/backend/MatrixStack.hpp:215: allocate() debug assert((uintptr_t)(&(_data[_headers.back().dataIndex])) % 16 == 0) failed: Memory is not properly aligned
*** Aborted at 1527240675 (unix time) try "date -d @1527240675" if you are using GNU date ***
PC: @ 0xb7edcce5 ([vdso]+0xce4)
*** SIGABRT (@0x505c) received by PID 20572 (TID 0xb3657700) from PID 20572; stack trace: ***
    @ 0xb7edccfc ([vdso]+0xcfb)
    @ 0xb7edcce5 ([vdso]+0xce4)
    @ 0xb5ad2ea9 gsignal
    @ 0xb5ad4407 abort
    @ 0xb5d42d35 __gnu_cxx::__verbose_terminate_handler()
    @ 0xb5d40833 (unknown)
    @ 0xb5d408ad std::terminate()
    @ 0xb5d40b70 __cxa_throw
    @ 0xb7824dfd sm::detail::sm_throw_exception<>()
    @ 0xb7823916 sm::detail::sm_throw_exception<>()
    @ 0xb782309b aslam::backend::MatrixStack::allocate()
    @ 0xb78de88d aslam::backend::ErrorTerm::evaluateWeightedJacobian<>()
    @ 0xb78dd293 aslam::backend::ErrorTermFs<>::getWeightedJacobians()
    @ 0xb5051bef aslam::backend::CompressedColumnJacobianTransposeBuilder<>::evaluateJacobians()
    @ 0xb5051d2d aslam::backend::CompressedColumnJacobianTransposeBuilder<>::setupThreadedJob<>()
    @ 0xb504d4c1 aslam::backend::CompressedColumnJacobianTransposeBuilder<>::buildSystem()
    @ 0xb504a8f9 aslam::backend::SparseCholeskyLinearSystemSolver::buildSystem()
    @ 0xb50cdf3d aslam::backend::LevenbergMarquardtTrustRegionPolicy::solveSystemImplementation()
    @ 0xb50b210b aslam::backend::TrustRegionPolicy::solveSystem()
    @ 0xb50fe617 aslam::backend::Optimizer2::optimizeImplementation()
    @ 0xb50fc4ca aslam::backend::OptimizerBase::optimize()
    @ 0xb50fde45 aslam::backend::Optimizer2::optimize()
    @ 0xb781b9c1 _ZZN5aslam11calibration15BatchCalibrator9calibrateEvENKUlvE_clEv
    @ 0xb781d546 _ZNSt17_Function_handlerIFvvEZN5aslam11calibration15BatchCalibrator9calibrateEvEUlvE_E9_M_invokeERKSt9_Any_data
    @ 0xb780cb8f std::function<>::operator()()
    @ 0xb78069fe aslam::calibration::AbstractCalibrator::estimate()
    @ 0xb781bf83 aslam::calibration::BatchCalibrator::calibrate()
    @  0x8068a1d main
    @ 0xb5abf637 __libc_start_main
    @  0x8066971 (unknown)
Aborted (core dumped)

I don't know what is exactly an Alsam Exception and dunno where she come from
i suppose that one of my csv file is wrong or bad generate so i recreate them but nothings change
if someone know how to solve this can you tell me ?

Thanks you in advance,

Bwaki

@Bwaki thanks for reporting. Apparently we didn't read the language specs on overaligned allocation properly :(. To be sure about this diagnosis, could you please provide the output of

#include <iostream>
#include <cstddef>
int main()
{
    std::cout << alignof(std::max_align_t) << '\n';
}

after compiling it with the same compiler on the same system?

I will do it on Monday and will give you the output

thank you for the answer @HannesSommer

Thanks!
PR #211 should provide a solution in case my diagnosis was right. Maybe you can as well try whether it helps?

@HannesSommer

Hello,

i add these 3 line at the main's beginning of batch_estimator.cc

  std::cout << "============================================================";
  std::cout << alignof(std::max_align_t) << '\n';
  std::cout << "============================================================";

the 2 library

#include <iostream>
#include <cstddef>

and i got no error even the calibration_optimized.json was generate

here the ouput with the modification

_data[_headers.back().dataIndex]: 2.24516
......
......
......
_data[_headers.back().dataIndex]: 1.56326
I0528 09:28:39.435415 15399 AbstractCalibrator.cpp:249] Optimizer: Linear system solved:
The Jacobian matrix is: 59682 x 18799
I0528 09:28:39.435822 15399 DelayCv.cpp:20] Delay pose2_d has been updated to 2.04944 s.
I0528 09:28:39.563371 15399 AbstractCalibrator.cpp:269] Optimizer: Variables updated:
*              pose2_R r:-0.147497 (-0.0011923)
*                      p: 1.35942 (+0.000836514)
*                      y:0.385056 (+0.000872097)
*              pose2_T x:-0.0188205 (+0.00652795)
*                      y:-0.153504 (+0.00545125)
*                      z:0.928487 (-0.00582982)
*              pose2_d  : 2.04944 (+0.00139644)
W0528 09:28:43.343591 15399 AbstractCalibrator.cpp:256] Last update was a regression: 1871.2 -> 1871.29
I0528 09:28:43.729933 15399 AbstractCalibrator.cpp:249] Optimizer: Linear system solved:
The Jacobian matrix is: 59682 x 18799
I0528 09:28:43.730134 15399 DelayCv.cpp:20] Delay pose2_d has been updated to 2.04813 s.
I0528 09:28:43.857241 15399 AbstractCalibrator.cpp:269] Optimizer: Variables updated:
*              pose2_R r:-0.146958 (-7.61229e-05)
*                      p: 1.35875 (+5.29975e-05)
*                      y:0.383631 (+5.74295e-05)
*              pose2_T x:-0.0249292 (+0.000419276)
*                      y:-0.158606 (+0.000349155)
*                      z:0.933945 (-0.000372045)
*              pose2_d  : 2.04813 (+8.675e-05)
I0528 09:28:47.635143 15399 AbstractCalibrator.cpp:263] Optimizer: cost and residuals updated. Current cost: 1870.26 (decreased by 0.937975):
Error term statistics:
pose1Pose(698.816 / 9816 = 0.0711915)
pose2Pose(1171.45 / 131 = 8.94235)
Overall:1870.26
Cv-wise error term statistics:
*              pose2_R : #errors:131 pose2Pose(1171.45 / 131 = 8.94235)
*              pose2_T : #errors:131 pose2Pose(1171.45 / 131 = 8.94235)
*              pose2_d : #errors:131 pose2Pose(1171.45 / 131 = 8.94235)
I0528 09:28:47.657482 15399 BatchCalibrator.cpp:182] Final OptimizerStatus: 
	convergence: DX
	iterations: 2
	gradient norm: nan
	objective: 1870.26
	dobjective: 0.937975
	max dx: 0.000631296
	evals objective: 0
	evals derivative: 0
I0528 09:28:47.659991 15399 BatchCalibrator.cpp:185] After calibration:
*              pose2_R r:-0.146958 (-7.61229e-05)
*                      p: 1.35875 (+5.29975e-05)
*                      y:0.383631 (+5.74295e-05)
*              pose2_T x:-0.0249292 (+0.000419276)
*                      y:-0.158606 (+0.000349155)
*                      z:0.933945 (-0.000372045)
*              pose2_d  : 2.04813 (+8.675e-05)

I0528 09:28:59.444383 15399 batch_estimator.cc:210] Writing output to csv_file/calibration_optimized.json.

I will close the issue if you want to add a comment or an advice feel free to do it

Thanks for the help !

Thanks @Bwaki for the feedback. However, these lines can not solve the issue. They were merely of diagnostic nature. Hence, your problem is stochastic (does not happen all the time).

Unfortunately the output of these lines (just a single number) is missing (probably cut of at the beginning). My suggestion was to create a new program with just these lines. It would give me crucial information about your system.

If you are still will to help, could you try that? Creating a new file, e.g. print_max_align.cpp with the content

#include <iostream>
#include <cstddef>
int main()
{
    std::cout << alignof(std::max_align_t) << '\n';
}

and compile it e.g. with "g++ print_max_align.cpp --std=c++11 -o print_max_alignand then run./print_max_align`? And then report the single number it outputs?

Thanks @HannesSommer I thought that this output was strange too because of course your code just print a number and moreover if I delete the code that i have add in batch_estimator.cc, the error is still here.

So here is the output :

bwaki@bwaki-Precision-WorkStation-T3500:~$ g++ print_max_align.cpp --std=c++11 -o print_max_align
bwaki@bwaki-Precision-WorkStation-T3500:~$ ./print_max_align
8

does 8 is false ? normally max_align need to be set at 16 ?

Thanks again.
You cannot set it. It is a property of your system. And 8 confirms my theory of the problem :).
So, I assume that PR #211 (already merged) actually fixes your problem. So please use the newest master and report back if the problem occurs again.

This issue can be closed after a while without negative feedback.

@HannesSommer Thanks you very much ! This is working perfectly. Have a nice day