rust-rpi-led-matrix/rust-rpi-rgb-led-matrix

SEGFAULT on matrix initialization.

Closed this issue · 10 comments

gagik commented

I am getting a SEGFAULT, even with the PRs, could this be an issue of the original C library updates or perhaps not being bound right from my end? I have set rustc-link-search to the hzeller's lib directory with compiled files.
I am just running something as simple as this. The SEGFAULT happens when it gets to the binding of led_matrix_create_from_options. @TDHolmes?

use rpi_led_matrix as led;

fn main() {
    let config = led::LedMatrixOptions::new();
    let matrix = led::LedMatrix::new(Some(config));
}

I've seen segfaults when the C++ library has changed but we haven't updated our structures. This should be improved by #1. Let me check if any breaking changes have landed in the C++ library.

interesting, I'm getting the segfault too, but only when running binaries, not when building examples in the library crate.

$ sudo ./target/debug/matrix-test
Segmentation fault
$

but with an example from this crate (not pushed yet):

$ sudo ./target/debug/examples/test
  59.7Hz
$

runs just fine.

very interesting. It seems to be due to panel_type not being referenced anywhere, and not getting a string allocated to it, and when the C++ library references it, it segfaults. If I add methods to update that field, it no longer occurs. I've opened up a PR to add this setter (#8). Can you please try it out to make sure it fixes the issue for you too?

I think there might be additional memory issues, but #8 at least seems to help.

gagik commented

Hm, #8 did not fix the issue for me but I looked more into it and noticed that the issue does seem to be with memory access to panel_type as well as other CStrings in LedMatrixOptions. What is really odd is that hardware_mapping and pixel_mapper_config seem to work fine. I'm on Raspberry Pi 4

Running

    let mut config = led::LedMatrixOptions::new();
    config.set_rows(16);
    config.set_led_rgb_sequence("GGG");
    config.set_hardware_mapping("mapping-test");
    config.set_panel_type("panel-test");
    let matrix = led::LedMatrix::new(Some(config));

and then in gdb

Breakpoint 1, rgb_matrix::RGBMatrix::Options::Validate (
    this=this@entry=0x7ffffff338, err_in=err_in@entry=0x7ffffff250)
    at options-initialize.cc:359
359	  std::string scratch;
(gdb) p led_rgb_sequence 
$17 = 0x6400000001 <error: Cannot access memory at address 0x6400000001>
(gdb) p panel_type
$18 = 0x10100000000 <error: Cannot access memory at address 0x10100000000>
(gdb) p pixel_mapper_config
$19 = 0x0
(gdb) p hardware_mapping 
$20 = 0x55555e5530 "mapping-test"

I am trying to figure out what is the difference between hardware_mapping, pixel_mapper_config and the other two.

running the app through valgrind, I got a lot of memory errors... I'm currently upgrading my raspberry pi to the beta 64 bit OS to get newer valgrind support to dig in further

Valgrind output:

==1725== Invalid read of size 1
==1725==    at 0x484B33C: strlen (vg_replace_strmem.c:460)
==1725==    by 0x132047: rgb_matrix::RGBMatrix::Options::Validate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) const (options-initialize.cc:435)
==1725==    by 0x13102F: rgb_matrix::RGBMatrix::CreateFromOptions(rgb_matrix::RGBMatrix::Options const&, rgb_matrix::RuntimeOptions const&) (led-matrix.cc:621)
==1725==    by 0x12FD53: led_matrix_create_from_options_optional_edit(RGBLedMatrixOptions*, int*, char***, bool) (led-matrix-c.cc:126)
==1725==    by 0x1100EF: rpi_led_matrix::LedMatrix::new (lib.rs:36)
==1725==    by 0x10F5E7: matrix_test::main (main.rs:5)
==1725==    by 0x10F6AF: std::rt::lang_start::{{closure}} (rt.rs:67)
==1725==    by 0x117D3B: {{closure}} (rt.rs:52)
==1725==    by 0x117D3B: do_call<closure-0,i32> (panicking.rs:348)
==1725==    by 0x117D3B: try<i32,closure-0> (panicking.rs:325)
==1725==    by 0x117D3B: catch_unwind<closure-0,i32> (panic.rs:394)
==1725==    by 0x117D3B: std::rt::lang_start_internal (rt.rs:51)
==1725==    by 0x10F687: std::rt::lang_start (rt.rs:67)
==1725==    by 0x10F637: main (in /home/pi/projects/rpi-matrix/matrix-test/target/debug/matrix-test)
==1725==  Address 0x4c9889000000000 is not stack'd, malloc'd or (recently) free'd
==1725==
==1725==
==1725== Process terminating with default action of signal 11 (SIGSEGV)
==1725==  Access not within mapped region at address 0x4C9889000000000
==1725==    at 0x484B33C: strlen (vg_replace_strmem.c:460)
==1725==    by 0x132047: rgb_matrix::RGBMatrix::Options::Validate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) const (options-initialize.cc:435)
==1725==    by 0x13102F: rgb_matrix::RGBMatrix::CreateFromOptions(rgb_matrix::RGBMatrix::Options const&, rgb_matrix::RuntimeOptions const&) (led-matrix.cc:621)
==1725==    by 0x12FD53: led_matrix_create_from_options_optional_edit(RGBLedMatrixOptions*, int*, char***, bool) (led-matrix-c.cc:126)
==1725==    by 0x1100EF: rpi_led_matrix::LedMatrix::new (lib.rs:36)
==1725==    by 0x10F5E7: matrix_test::main (main.rs:5)
==1725==    by 0x10F6AF: std::rt::lang_start::{{closure}} (rt.rs:67)
==1725==    by 0x117D3B: {{closure}} (rt.rs:52)
==1725==    by 0x117D3B: do_call<closure-0,i32> (panicking.rs:348)
==1725==    by 0x117D3B: try<i32,closure-0> (panicking.rs:325)
==1725==    by 0x117D3B: catch_unwind<closure-0,i32> (panic.rs:394)
==1725==    by 0x117D3B: std::rt::lang_start_internal (rt.rs:51)
==1725==    by 0x10F687: std::rt::lang_start (rt.rs:67)
==1725==    by 0x10F637: main (in /home/pi/projects/rpi-matrix/matrix-test/target/debug/matrix-test)
==1725==  If you believe this happened as a result of a stack
==1725==  overflow in your program's main thread (unlikely but
==1725==  possible), you can try to increase the size of the
==1725==  main thread stack using the --main-stacksize= flag.
==1725==  The main thread stack size used in this run was 8388608.
==1725==

@gagik I think I figured it out. Check out #9 to see if it fixes the issue. I think the reasoning definitely makes sense and explains the weirdness we were seeing.

whoops I guess it auto closes if you merge a linked PR. @gagik please let me know if this doesn't fix it. I pushed an update (0.1.5) with this fix.

gagik commented

Fixed it for me, seems all perfect now. Awesome, thank you. Will see if I can contribute in any ways in the near future.