segmentation fault
matthijsvk opened this issue · 18 comments
Hi,
I get a segmentation fault when running python test.py
.
Installed according to the instructions, with CUDA 10.1.
I'm running Ubuntu 19.10.
Any suggestions how to debug this?
Hi!
Can you try the following?
gdb -batch -ex "run" -ex "bt" --args python test.py
I have an idea as to why it may be (some dlopen calls that fail somewhere) but want confirmation :)
Hi, here are the results:
gdb -batch -ex "run" -ex "bt" --args python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 6591]
[New Thread 0x7ffff07ba700 (LWP 6593)]
[New Thread 0x7fffeffb9700 (LWP 6594)]
[New Thread 0x7fffef7b8700 (LWP 6595)]
[New Thread 0x7fff8e4e3700 (LWP 6606)]
[New Thread 0x7fff8dce2700 (LWP 6607)]
[New Thread 0x7fff8d4e1700 (LWP 6608)]
[New Thread 0x7fff8cbe5700 (LWP 6618)]
[New Thread 0x7fff85fff700 (LWP 6619)]
[New Thread 0x7fff857fe700 (LWP 6620)]
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff67bece0 in triton::ir::user::replace_uses_of_with(triton::ir::value*, triton::ir::value*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#0 0x00007ffff67bece0 in triton::ir::user::replace_uses_of_with(triton::ir::value*, triton::ir::value*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#1 0x00007ffff67bed77 in triton::ir::user::replace_all_uses_with(triton::ir::value*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#2 0x00007ffff67b66e5 in triton::ir::module::try_remove_trivial_phis(triton::ir::phi_node*&) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#3 0x00007ffff67b7602 in triton::ir::module::seal_block(triton::ir::basic_block*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#4 0x00007ffff67cce99 in Generator::VisitForStmt(ForStmt*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#5 0x00007ffff67d0d4d in Generator::VisitCompoundStmt(CompoundStmt*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#6 0x00007ffff67d1a58 in Generator::VisitFuncDef(FuncDef*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#7 0x00007ffff67d0e6d in Generator::Gen(triton::ir::module*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#8 0x00007ffff6809880 in make_module(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::ir::module*, triton::runtime::function::options_space_t const&) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#9 0x00007ffff680f27a in get_fn_signature(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#10 0x00007ffff6826cf8 in pybind11::cpp_function::initialize<std::vector<triton::runtime::arg_type, std::allocator<triton::runtime::arg_type> > (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&), std::vector<triton::runtime::arg_type, std::allocator<triton::runtime::arg_type> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&, pybind11::name, pybind11::scope, pybind11::sibling>(std::vector<triton::runtime::arg_type, std::allocator<triton::runtime::arg_type> > (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&), std::vector<triton::runtime::arg_type, std::allocator<triton::runtime::arg_type> > (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#11 0x00007ffff682847a in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /home/matthijsv/Code/DL/others/pytorch-blocksparse/src/triton/python/triton/_C/libtriton.so
#12 0x00005555556b9d24 in _PyMethodDef_RawFastCallKeywords (method=0x555555c1ec00, self=0x7ffff6921c90, args=0x5555b6e7f430, nargs=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:694
#13 0x00005555556b9e41 in _PyCFunction_FastCallKeywords (func=0x7ffff68bd870, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:734
#14 0x0000555555726415 in call_function (kwnames=0x0, oparg=2, pp_stack=<synthetic pointer>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:4568
#15 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3093
#16 0x0000555555669749 in _PyEval_EvalCodeWithName (_co=0x7ffff74e75d0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x7fff9571f388, kwargs=0x7fff9571f390, kwcount=<optimized out>, kwstep=2, defs=0x7ffff68ad248, defcount=2, kwdefs=0x0, closure=0x0, name=0x7ffff76244f0, qualname=0x7ffff74e9c70) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3930
#17 0x000055555566aab0 in _PyFunction_FastCallDict (func=<optimized out>, args=0x7fffffffc490, nargs=2, kwargs=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:376
#18 0x0000555555688b63 in _PyObject_Call_Prepend (callable=0x7fffe88ed710, obj=<optimized out>, args=0x7fff95726550, kwargs=0x7fff9571f730) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:908
#19 0x00005555556c0efa in slot_tp_init (self=0x7fff9619a990, args=0x7fff95726550, kwds=0x7fff9571f730) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/typeobject.c:6636
#20 0x00005555556c1b08 in type_call (kwds=0x7fff9571f730, args=0x7fff95726550, type=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/typeobject.c:971
#21 _PyObject_FastCallKeywords (callable=0x555555f01360, stack=0x5555b46ebf70, nargs=<optimized out>, kwnames=0x7ffff74e0450) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:199
#22 0x0000555555726c07 in call_function (kwnames=0x7ffff74e0450, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:4619
#23 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3139
#24 0x0000555555669f08 in _PyEval_EvalCodeWithName (_co=0x7ffff74dd150, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x5555ac31bdd8, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff74e0770, qualname=0x7ffff74e12b0) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3930
#25 0x00005555556b9527 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=0x5555ac31bd78, nargs=12, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:433
#26 0x0000555555721806 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:4616
#27 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3124
#28 0x000055555566a7bb in function_code_fastcall (globals=<optimized out>, nargs=24, args=<optimized out>, co=0x7ffff74dd5d0) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:283
#29 _PyFunction_FastCallDict (func=<optimized out>, args=0x7fff95786668, nargs=24, kwargs=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:322
#30 0x00007fffe7d0ea3f in THPFunction_apply(_object*, _object*) () from /home/matthijsv/bin/anaconda/envs/pytorch-blocksparse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#31 0x00005555556b9ca0 in _PyMethodDef_RawFastCallKeywords (method=0x7fffe860a3c0 <THPFunction_methods>, self=0x555557bdfbe0, args=0x5555b80ce890, nargs=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:698
#32 0x00005555556b9e41 in _PyCFunction_FastCallKeywords (func=0x7fff9571fe60, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:734
#33 0x0000555555726415 in call_function (kwnames=0x0, oparg=23, pp_stack=<synthetic pointer>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:4568
#34 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3093
#35 0x000055555566a7bb in function_code_fastcall (globals=<optimized out>, nargs=3, args=<optimized out>, co=0x7ffff74dd810) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:283
#36 _PyFunction_FastCallDict (func=<optimized out>, args=0x7fffffffcdf0, nargs=3, kwargs=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:322
#37 0x0000555555688b63 in _PyObject_Call_Prepend (callable=0x7fff9571d950, obj=<optimized out>, args=0x7fff9571f410, kwargs=0x0) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:908
#38 0x00005555556c0fba in slot_tp_call (self=0x7fff95726310, args=0x7fff9571f410, kwds=0x0) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/typeobject.c:6402
#39 0x00005555556c1e7b in _PyObject_FastCallKeywords (callable=0x7fff95726310, stack=0x555557bd0f30, nargs=2, kwnames=0x0) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:199
#40 0x0000555555725fd6 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:4619
#41 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3124
#42 0x00005555556b929b in function_code_fastcall (globals=<optimized out>, nargs=8, args=<optimized out>, co=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:283
#43 _PyFunction_FastCallKeywords (func=<optimized out>, stack=0x555557bb6308, nargs=8, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:408
#44 0x0000555555721806 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:4616
#45 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3124
#46 0x00005555556b929b in function_code_fastcall (globals=<optimized out>, nargs=8, args=<optimized out>, co=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:283
#47 _PyFunction_FastCallKeywords (func=<optimized out>, stack=0x7ffff75735e0, nargs=8, kwnames=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Objects/call.c:408
#48 0x0000555555721806 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:4616
#49 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3124
#50 0x0000555555669749 in _PyEval_EvalCodeWithName (_co=0x7ffff7562ae0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3930
#51 0x000055555566a674 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:3959
#52 0x000055555566a69c in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/ceval.c:524
#53 0x0000555555780bc4 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff75dbcd0, locals=0x7ffff75dbcd0, flags=<optimized out>, arena=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/pythonrun.c:1035
#54 0x000055555578aff1 in PyRun_FileExFlags (fp=0x5555558c49f0, filename_str=<optimized out>, start=<optimized out>, globals=0x7ffff75dbcd0, locals=0x7ffff75dbcd0, closeit=1, flags=0x7fffffffd540) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/pythonrun.c:988
#55 0x000055555578b1e3 in PyRun_SimpleFileExFlags (fp=0x5555558c49f0, filename=<optimized out>, closeit=1, flags=0x7fffffffd540) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Python/pythonrun.c:429
#56 0x000055555578c2d5 in pymain_run_file (p_cf=0x7fffffffd540, filename=0x5555558c3960 L"test.py", fp=0x5555558c49f0) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Modules/main.c:428
#57 pymain_run_filename (cf=0x7fffffffd540, pymain=0x7fffffffd650) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Modules/main.c:1607
#58 pymain_run_python (pymain=0x7fffffffd650) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Modules/main.c:2868
#59 pymain_main (pymain=0x7fffffffd650) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Modules/main.c:3029
#60 0x000055555578c3fc in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1583419343668/work/Modules/main.c:3064
#61 0x00007ffff7db91e3 in __libc_start_main (main=0x55555564a2f0 <main>, argc=2, argv=0x7fffffffd7a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd798) at ../csu/libc-start.c:308
#62 0x0000555555731140 in _start () at ../sysdeps/x86_64/elf/start.S:103
That's interesting. I can't reproduce this error, but I will try hard to over the weekend. What version of Python and PyTorch do you have? Are you in a virtual environment with any other package? I may create a virtual machine with Ubuntu 19.10 and exactly your python/pytorch version.
Sorry for the delay, I was working on block-sparse multi-head self-attention and block-sparse convolutions. Unfortunately I cannot reproduce the issue. However, I really want to get to the bottom of it. Can you try to clone triton and run the c++ test?
cd /tmp;
git clone https://github.com/ptillet/triton.git
cd triton;
mkdir build;
cd build;
cmake ../;
make -j4;
./tests/bench/bench_dot
I'm closing this due to inactivity. I will re-open it if I hear about anyone encountering it.
Hi, I have experienced the same error. Also running Ubuntu 19.10 with CUDA 10.1 with Python 3.7.5 and Pytorch version 1.4.0. I also ran the triton c++ test as suggested and that crashed with segmentation fault as well.
Is there anything I can do to help debug this?
Hi! Do you also have the same backtrace? What compiler are you using? (gcc --version)
Yep backtrace looks mostly the same to me. Seems the same running gdb on bench_dot as well. And I'm using gcc version 9.2.1
gdb -batch -ex "run" -ex "bt" --args python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fff7f641700 (LWP 30337)]
[New Thread 0x7fff7ee40700 (LWP 30338)]
[New Thread 0x7fff7e63f700 (LWP 30342)]
[New Thread 0x7fff7de3d780 (LWP 30392)]
[New Thread 0x7fff7d63b800 (LWP 30393)]
[New Thread 0x7fff7ce39880 (LWP 30394)]
[New Thread 0x7fff462f2900 (LWP 30395)]
[New Thread 0x7fff45af0980 (LWP 30396)]
[New Thread 0x7fff452eea00 (LWP 30397)]
[New Thread 0x7fff3f516a80 (LWP 30398)]
[New Thread 0x7fff3ed15700 (LWP 30426)]
[New Thread 0x7ffee9ffe700 (LWP 30427)]
Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fff9ab8a8b4 in triton::ir::user::replace_uses_of_with(triton::ir::value*, triton::ir::value*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#0 0x00007fff9ab8a8b4 in triton::ir::user::replace_uses_of_with(triton::ir::value*, triton::ir::value*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#1 0x00007fff9ab8a947 in triton::ir::user::replace_all_uses_with(triton::ir::value*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#2 0x00007fff9ab82599 in triton::ir::module::try_remove_trivial_phis(triton::ir::phi_node*&) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#3 0x00007fff9ab83522 in triton::ir::module::seal_block(triton::ir::basic_block*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#4 0x00007fff9ab979f6 in Generator::VisitForStmt(ForStmt*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#5 0x00007fff9ab9c2dd in Generator::VisitCompoundStmt(CompoundStmt*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#6 0x00007fff9ab9d1db in Generator::VisitFuncDef(FuncDef*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#7 0x00007fff9ab9c3fd in Generator::Gen(triton::ir::module*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#8 0x00007fff9abd8272 in make_module(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::ir::module*, triton::runtime::function::options_space_t const&) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#9 0x00007fff9abdbf87 in get_fn_signature(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#10 0x00007fff9abf3d82 in pybind11::cpp_function::initialize<std::vector<triton::runtime::arg_type, std::allocator<triton::runtime::arg_type> > (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&), std::vector<triton::runtime::arg_type, std::allocator<triton::runtime::arg_type> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&, pybind11::name, pybind11::scope, pybind11::sibling>(std::vector<triton::runtime::arg_type, std::allocator<triton::runtime::arg_type> > (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&), std::vector<triton::runtime::arg_type, std::allocator<triton::runtime::arg_type> > (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, triton::runtime::function::options_space_t const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#11 0x00007fff9abf5517 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/local/lib/python3.7/dist-packages/triton-0.1-py3.7-linux-x86_64.egg/triton/_C/libtriton.so
#12 0x00000000005c85bb in _PyMethodDef_RawFastCallKeywords ()
#13 0x0000000000535990 in ?? ()
#14 0x000000000053c5a1 in _PyEval_EvalFrameDefault ()
#15 0x00000000005365e7 in _PyEval_EvalCodeWithName ()
#16 0x00000000005ca63e in _PyFunction_FastCallDict ()
#17 0x000000000056e9ab in ?? ()
#18 0x00000000005c9ba6 in _PyObject_FastCallKeywords ()
#19 0x0000000000535a11 in ?? ()
#20 0x00000000005394e1 in _PyEval_EvalFrameDefault ()
#21 0x00000000005365e7 in _PyEval_EvalCodeWithName ()
#22 0x00000000005c9468 in _PyFunction_FastCallKeywords ()
#23 0x0000000000535880 in ?? ()
#24 0x00000000005394e1 in _PyEval_EvalFrameDefault ()
#25 0x0000000000536f27 in _PyEval_EvalCodeWithName ()
#26 0x00000000005c9468 in _PyFunction_FastCallKeywords ()
#27 0x0000000000535880 in ?? ()
#28 0x000000000053c5a1 in _PyEval_EvalFrameDefault ()
#29 0x00000000005ca47a in _PyFunction_FastCallDict ()
#30 0x00007fffe86173ef in THPFunction_apply(_object*, _object*) () from /home/sam/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#31 0x00000000005c8663 in _PyMethodDef_RawFastCallKeywords ()
#32 0x0000000000535990 in ?? ()
#33 0x000000000053c5a1 in _PyEval_EvalFrameDefault ()
#34 0x00000000005ca47a in _PyFunction_FastCallDict ()
#35 0x00000000005cb10d in _PyObject_Call_Prepend ()
#36 0x000000000056e6c7 in ?? ()
#37 0x00000000005c9f63 in _PyObject_FastCallKeywords ()
#38 0x0000000000535a11 in ?? ()
#39 0x00000000005385e2 in _PyEval_EvalFrameDefault ()
#40 0x00000000005365e7 in _PyEval_EvalCodeWithName ()
#41 0x00000000005c9468 in _PyFunction_FastCallKeywords ()
#42 0x0000000000535880 in ?? ()
#43 0x00000000005394e1 in _PyEval_EvalFrameDefault ()
#44 0x00000000005c916b in _PyFunction_FastCallKeywords ()
#45 0x0000000000535880 in ?? ()
#46 0x00000000005385e2 in _PyEval_EvalFrameDefault ()
#47 0x00000000005365e7 in _PyEval_EvalCodeWithName ()
#48 0x000000000064cbb3 in PyEval_EvalCode ()
#49 0x00000000006402a3 in ?? ()
#50 0x0000000000640357 in PyRun_FileExFlags ()
#51 0x000000000064110a in PyRun_SimpleFileExFlags ()
#52 0x0000000000678eff in ?? ()
#53 0x00000000006791ee in _Py_UnixMain ()
#54 0x00007ffff7de71e3 in __libc_start_main (main=0x4cfb10 <main>, argc=2, argv=0x7fffffffe4b8, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fffffffe4a8) at ../csu/libc-start.c:308
#55 0x00000000005cf93e in _start ()```
I am finally able to reproduce it with gcc-9.2. I'm on it.
The segfault I had was actually from another place in codegen. It could cause undefined behavior though. Do you still have it on the latest version of Triton by retrying the process above?
No luck I'm afraid. Tried gcc-8.4 as well and it's the same problem.
So I guess I wasn't able to reproduce it after all. Can you repeat the above steps using the Debug mode instead:
cmake -DCMAKE_BUILD_TYPE=Debug ../
and send me the updated backtrace (should include line number)?
Also, thanks a lot for doing all this :) I really want to get to the bottom of this.
No worries, thank you for taking your time to debug this! Really great to see these sparse operations for PyTorch.
gdb -batch -ex "run" -ex "bt" --args ./tests/bench/bench_dot
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffec33f700 (LWP 29853)]
[New Thread 0x7fffebb3e700 (LWP 29854)]
[New Thread 0x7fffeb16f700 (LWP 29855)]
Thread 1 "bench_dot" received signal SIGSEGV, Segmentation fault.
0x00007ffff7d449be in std::vector<triton::ir::value*, std::allocator<triton::ir::value*> >::size (this=0x700000002006
0) at /usr/include/c++/9/bits/stl_vector.h:916
916 { return size_type(this->_M_impl._M_finish - this->_M_impl._M_start); }
#0 0x00007ffff7d449be in std::vector<triton::ir::value*, std::allocator<triton::ir::value*> >::size (this=0x70000000
20060) at /usr/include/c++/9/bits/stl_vector.h:916
#1 0x00007ffff7e214e0 in triton::ir::user::replace_uses_of_with (this=0x7000000020000, before=0x555555bb1f30, after=
0x555555ba7210) at /tmp/triton/lib/ir/value.cc:69
#2 0x00007ffff7e21486 in triton::ir::user::replace_all_uses_with (this=0x555555bb1f30, target=0x555555ba7210) at /tm
p/triton/lib/ir/value.cc:64
#3 0x00007ffff7e0c58f in triton::ir::module::try_remove_trivial_phis (this=0x5555559043c0, phi=@0x555555bb1f20: 0x55
5555bb1f30) at /tmp/triton/lib/ir/module.cc:75
#4 0x00007ffff7e0cdd5 in triton::ir::module::seal_block (this=0x5555559043c0, block=0x555555baf8d0) at /tmp/triton/l
ib/ir/module.cc:147
#5 0x00007ffff7e513e3 in Generator::VisitForStmt (this=0x7fffffffd150, forStmt=0x555555ba4010) at /tmp/triton/lib/la
ng/code_gen.cc:434
#6 0x00007ffff7e21d38 in ForStmt::Accept (this=0x555555ba4010, v=0x7fffffffd150) at /tmp/triton/lib/lang/ast.cc:54
#7 0x00007ffff7e54fb0 in Generator::Visit (this=0x7fffffffd150, node=0x555555ba4010) at /tmp/triton/include/triton/l
ang/code_gen.h:55
#8 0x00007ffff7e51839 in Generator::VisitCompoundStmt (this=0x7fffffffd150, compoundStmt=0x555555b9b8d0) at /tmp/tri
ton/lib/lang/code_gen.cc:462
#9 0x00007ffff7e21ddc in CompoundStmt::Accept (this=0x555555b9b8d0, v=0x7fffffffd150) at /tmp/triton/lib/lang/ast.cc:69
#10 0x00007ffff7e5501c in Generator::VisitStmt (this=0x7fffffffd150, stmt=0x555555b9b8d0) at /tmp/triton/include/triton/lang/code_gen.h:57
#11 0x00007ffff7e51ddf in Generator::VisitFuncDef (this=0x7fffffffd150, funcDef=0x555555b97f90) at /tmp/triton/lib/lang/code_gen.cc:494
#12 0x00007ffff7e22030 in FuncDef::Accept (this=0x555555b97f90, v=0x7fffffffd150) at /tmp/triton/lib/lang/ast.cc:123
#13 0x00007ffff7e54fb0 in Generator::Visit (this=0x7fffffffd150, node=0x555555b97f90) at /tmp/triton/include/triton/lang/code_gen.h:55
#14 0x00007ffff7e51ff2 in Generator::VisitTranslationUnit (this=0x7fffffffd150, unit=0x555555b62200) at /tmp/triton/lib/lang/code_gen.cc:502
#15 0x00007ffff7e520c8 in Generator::Gen (this=0x7fffffffd150, mod=0x5555559043c0) at /tmp/triton/lib/lang/code_gen.cc:511
#16 0x00007ffff7ea7778 in triton::runtime::function::make_ir (this=0x7fffffffdf60, parser=...) at /tmp/triton/lib/runtime/function.cc:194
#17 0x00007ffff7ea82b8 in triton::runtime::function::make (this=0x7fffffffdf60, stream=0x555555b4ad00, opt=...) at /tmp/triton/lib/runtime/function.cc:278
#18 0x00007ffff7ea8868 in triton::runtime::function::<lambda(std::vector<long unsigned int, std::allocator<long unsigned int> >)>::operator()(std::vector<unsigned long, std::allocator<unsigned long> >) const (__closure=0x555555b57bc0,
params=std::vector of length 17, capacity 17 = {...}) at /tmp/triton/lib/runtime/function.cc:316
#19 0x00007ffff7ea9f2a in std::_Function_handler<void(const std::vector<long unsigned int, std::allocator<long unsign
ed int> >&), triton::runtime::function::precompile(triton::driver::stream*, const triton::runtime::function::options_
space_t&)::<lambda(std::vector<long unsigned int, std::allocator<long unsigned int> >)> >::_M_invoke(const std::_Any_
data &, const std::vector<unsigned long, std::allocator<unsigned long> > &) (__functor=..., __args#0=std::vector of l
ength 17, capacity 17 = {...}) at /usr/include/c++/9/bits/std_function.h:300
#20 0x00007ffff7eac707 in std::function<void (std::vector<unsigned long, std::allocator<unsigned long> > const&)>::op
erator()(std::vector<unsigned long, std::allocator<unsigned long> > const&) const (this=0x7fffffffd8f0, __args#0=std:
:vector of length 17, capacity 17 = {...}) at /usr/include/c++/9/bits/std_function.h:690
#21 0x00007ffff7ea6712 in triton::runtime::_loop_nest(std::vector<unsigned long, std::allocator<unsigned long> > cons
t&, std::function<void (std::vector<unsigned long, std::allocator<unsigned long> > const&)> const&) (ranges=std::vect
or of length 17, capacity 32 = {...}, f=...) at /tmp/triton/lib/runtime/function.cc:54
#22 0x00007ffff7ea8c19 in triton::runtime::function::precompile (this=0x7fffffffdf60, stream=0x555555b4ad00, space=..
.) at /tmp/triton/lib/runtime/function.cc:328
#23 0x00007ffff7ea99bc in triton::runtime::function::operator()(std::vector<triton::runtime::arg, std::allocator<trit
on::runtime::arg> > const&, std::function<std::vector<unsigned long, std::allocator<unsigned long> > (triton::runtime
::function::options_t const&)> const&, triton::driver::stream*) (this=0x7fffffffdf60, args=std::vector of length 10,
capacity 10 = {...}, grid_fn=..., stream=0x555555b4ad00) at /tmp/triton/lib/runtime/function.cc:459
#24 0x0000555555589fe2 in triton_dot<float>(triton::driver::stream*, bool, bool, int, int, int, int, int, int, int, s
td::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, run_mode_t, std::vector<
double, std::allocator<double> >&, bool&)::{lambda()#2}::operator()() const (this=0x555555b58d70) at /tmp/triton/test
s/common/dot.h:126
#25 0x0000555555596416 in std::_Function_handler<void (), triton_dot<float>(triton::driver::stream*, bool, bool, int,
int, int, int, int, int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> >
const&, run_mode_t, std::vector<double, std::allocator<double> >&, bool&)::{lambda()#2}>::_M_invoke(std::_Any_data co
nst&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300
#26 0x0000555555585d38 in std::function<void ()>::operator()() const (this=0x7fffffffdec0) at /usr/include/c++/9/bits
/std_function.h:690
#27 0x0000555555584cbe in triton::tools::bench(std::function<void ()> const&, triton::driver::stream*, bool) (op=...,
stream=0x555555b4ad00, normalize=false) at /tmp/triton/include/triton/tools/bench.hpp:39
#28 0x000055555558bdf7 in triton_dot<float> (stream=0x555555b4ad00, AT=false, BT=false, M=1024, N=1024, K=1024, TM=0,
TN=0, TK=0, nwarp=0, a_order=std::vector of length 2, capacity 2 = {...}, b_order=std::vector of length 2, capacity
2 = {...}, mode=BENCH, bench=std::vector of length 0, capacity 0, test=@0x7fffffffe197: false) at /tmp/triton/tests/c
ommon/dot.h:126
#29 0x0000555555582ec7 in bench_dot (stream=0x555555b4ad00, dtype=FLOAT, AT=false, BT=false, M=1024, N=1024, K=1024,
a_order=std::vector of length 2, capacity 2 = {...}, b_order=std::vector of length 2, capacity 2 = {...}) at /tmp/tri
ton/tests/common/dot.h:176
#30 0x00005555555836d2 in main () at /tmp/triton/tests/bench/dot.cc:39
Hi, thanks a lot for reopening the issue. Sorry I lost track of this.
On Ubuntu 20.04, CUDA 10.2 the issue is here as well, also with newest Triton. (not sure if it's supposed to work with CUDA 10.2?)
I'm thinking this might be related to Ubuntu versions (I'm on 18.04). I may upgrade to Ubuntu 20.04 tonight and see if I can reproduce it. The backtrace is interesting, and I don't really understand how the line can fail unless there is some kind of undefined behavior elsewhere.
But yes, Triton is supposed to work with CUDA v10.2, although it's not really optimized for Turing tensor cores at the moment (will work on this some more once Ampere comes out)
@matthijsvk @SamWes Good news, I was able to reproduce the exact bug after upgrading to Ubuntu 20.04. I have fixed it in: ptillet/triton@8f9233e . Does not seem to break the tests so I pushed the new Triton version on PyPI.
Reinstalling Triton should solve the issue: pip install triton
Let me know if I can close the issue. And thanks a lot for reporting it.
Working great for me. Thanks a lot!