mp3guy/ElasticFusion

segmentation fault

wangmiaowei opened this issue · 10 comments

Everything is OK but when I run the ./ElasticFusion -l dyson_lab.klg -g. withe several seconds, "segmentation fault (core dumped) occures"
Really need some suggestion or help

Everything is OK but when I run the ./ElasticFusion -l dyson_lab.klg -g. withe several seconds, "segmentation fault (core dumped) occures"

Everything is OK but when I run the ./ElasticFusion -l dyson_lab.klg -g. withe several seconds, "segmentation fault (core dumped) occures"

Hi @wangmiaowei
Were you able to resolve the issue? I am facing the same error.

Edit: I ran the code with the GDB debugger and it suggests that I get a SIGSEGV error from Thread 1, indicating that the issue is memory related. Below is the output from the debugger.

Screenshot from 2022-07-04 13-12-18

After running backtrace on the above output, this is the result.

Screenshot from 2022-07-04 13-27-29

This suggests that the line that fails is this one: https://github.com/mp3guy/ElasticFusion/blob/master/Core/GlobalModel.cpp#L134

I was also able to print the full backtrace with variables. Here is a snapshot of part of the output, which suggests that the variable LocUpdate which is set at line L134 does seem to have been assigned values.

Screenshot from 2022-07-04 15-04-05

I am unable to dig deeper into this problem to find a solution though.

My computer is running:
Ubuntu 22.04 with the latest cuda toolkit (cuda 11.7 and nvidia driver 515.48.07). Even though this does not exactly match the cuda version and driver in the README.md, I hope that it should be fine. I am running a 4 GB Ram Nvidia Geforce GTX 1050 on my laptop. It is probably not fast enough to execute the code in real-time, but I was hoping that it is enough to just run the code.

Hi @tfy14esa, I meet the same issue here. Have you resolved that?

I can repro, investigating.

I presume you're running on a laptop with NVIDIA Prime? You need to make sure glxinfo returns NVIDIA and not SGI (Mesa). I did: sudo prime-select nvidia

Hi Thomas, it's running on a workstation, NVIDIA Quadro RTX 4000. This is the output of glxinfo

server glx vendor string: NVIDIA Corporation
client glx vendor string: Mesa Project and SGI
    Vendor: Mesa/X.org (0xffffffff)
OpenGL vendor string: Mesa/X.org

Also, below is the backtrace

Thread 1 "ElasticFusion" received signal SIGSEGV, Segmentation fault.
_dl_close (_map=0x0) at ./elf/dl-close.c:795
795	./elf/dl-close.c: No such file or directory.
(gdb) backtrace -full
#0  _dl_close (_map=0x0) at ./elf/dl-close.c:795
        map = 0x0
#1  0x00007ffff75bac28 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd3d0, operate=<optimized out>, args=<optimized out>)
    at ./elf/dl-error-skeleton.c:208
        errcode = 32767
        c = {exception = 0x7fffffffd3d0, errcode = 0x7fffffffd2dc, env = {{__jmpbuf = {140737488344103, 185497483243670832, -9416, 1, 1, 140737488345360, 
                185497483159784752, 185515866502057264}, __mask_was_saved = 0, __saved_mask = {__val = {140737488343976, 140734616638106, 140737340916992, 
                  140734616638211, 140737342518483, 140737340917184, 140737488343976, 140734616638106, 9373121150286852608, 140737488344040, 140737343368435, 0, 
                  0, 0, 9373121150286852608, 18446744073709542128}}}}}
        old = 0x0
#2  0x00007ffff75bacf3 in __GI__dl_catch_error (objname=0x7fffffffd428, errstring=0x7fffffffd430, mallocedp=0x7fffffffd427, operate=<optimized out>, 
    args=<optimized out>) at ./elf/dl-error-skeleton.c:227
        exception = {objname = 0x7ffff7364500 "", errstring = 0x7fffc80cbfbd "H\215\025<\234\001", 
          message_buffer = 0x1007fffffffd480 <error: Cannot access memory at address 0x1007fffffffd480>}
        errorcode = <optimized out>
#3  0x00007ffff74d61ae in _dlerror_run (operate=<optimized out>, args=<optimized out>) at ./dlfcn/dlerror.c:138
        result = <optimized out>
        objname = 0x7ffff74d5ed8 <__dlclose+40> "\367\330\031\300H\203\304\b\303\017\037\200"
        errstring = 0x7fff54d51303 "eglGetProcAddress"
        malloced = false
        errcode = <optimized out>
#4  0x00007ffff74d5ed8 in __dlclose (handle=<optimized out>) at ./dlfcn/dlclose.c:31
No locals.
#5  0x00007fff5490ef81 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
No symbol table info available.
#6  0x00007fff5490eff3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
No symbol table info available.
#7  0x00007fff549f9e02 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
No symbol table info available.
#8  0x00007ffff7eeef98 in __cudart1359 () from /root/ElasticFusion/build/Core/libefusion.so
No symbol table info available.
#9  0x00007ffff7f48b71 in cudaGraphicsGLRegisterImage () from /root/ElasticFusion/build/Core/libefusion.so
No symbol table info available.
#10 0x00005555555883d8 in GUI::GUI(bool, bool) ()
No symbol table info available.
#11 0x00005555555722fd in MainController::MainController(int, char**) ()
No symbol table info available.
#12 0x000055555556f761 in main ()
No symbol table info available.

Well something is wrong with your drivers or configuration. You should be running X11. And it should not list Mesa or SGI anywhere in the glxinfo client or server. This is not a problem with ElasticFusion, it's a problem with your setup.

Hi, I just have the same error
#222 (comment).
core dumped happened in the same line.
I tried to use this command "glxinfo", and output was like:
server glx vendor string: SGI
server glx version string: 1.4
server glx extensions:
.......
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
client glx extensions:
.......
#222 (comment) said, it should not list Mesa or SGI, but I don't know how to make it work.
By the way, nvidia-smi:
NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8
in Ubuntu 22.04

I am not sure how to solve this problem, could you give me some suggestions?
#222 (comment)

Are you running on a Laptop?

Either set the default GPU via the Prime settings in nvidia-settings, or run sudo prime-select nvidia and reboot.

#222 (comment)
I tried and I worked.
Perhaps I forgot to reboot my laptop.
Thank you!!!-v-