LinuxCNC crashes on exit when component `litexcnc_eth` is used
Peter-van-Tol opened this issue · 11 comments
Describe the bug
As noted in #28 , LinuxCNC crashes with the following error:
Shutting down and cleaning up LinuxCNC...
Running HAL shutdown script
task: 603 cycles, min=0.000041, max=0.012258, avg=0.009716, 0 latency excursions (> 10x expected cycle time of 0.010000s)
mb2hal quit_signal DEBUG: signal [15] received
mb2hal quit_cleanup DEBUG: started
mb2hal quit_cleanup DEBUG: unloading HAL module [16] ret[0]
mb2hal quit_cleanup DEBUG: done OK
mb2hal main OK: going to exit!
litexcnc: LitexCNC etherbone driver unloaded
rtapi_app: caught signal 11 - dumping core
free(): invalid pointer
<commandline>:0: exit value: 255
<commandline>:0: rmmod failed, returned -1
Waited 3 seconds for master. giving up.
Note: Using POSIX realtime
motmod: not loaded
<commandline>:0: exit value: 255
<commandline>:0: rmmod failed, returned -1
Note: Using POSIX realtime
trivkins: not loaded
<commandline>:0: exit value: 255
<commandline>:0: rmmod failed, returned -1
<commandline>:0: unloadrt failed
Note: Using POSIX realtime
To Reproduce
This error is due to an old loadrt statement in your hal-files. You have now:
loadrt litexcnc
loadrt litexcnc_eth connection_string="192.168.178.15
Above statements have been replaced with:
loadrt litexcnc connection_string="eth:192.168.178.150"
Expected behavior
An error message that the component litexcnc_eth
does not exist (as it cannot be used as stand-alone).
Additional context
Why this error emerges at this moment? It is because the FPGA is reset to its safe state when LinuxCNC is unloaded. This means that litexcnc will send a last message to the FPGA. When the FPGA is loaded using two separate statements, the etherbone driver is already unloaded (and memory thus freed up). Thus writing to a closed device, without allocated memory leads to a core dump.
Removing the component registration from LinuxCNC leads to incomprehensible error messages. Instead, the component litexcnc_eth
will now produce the following message before it stops LinuxCNC:
litexcnc: ERROR: Direct usage of the module `litexcnc_eth` is not supported
litexcnc: This is caused by the following loadrt-commands in your HAL-file:
litexcnc: loadrt litexcnc
litexcnc: loadrt litexcnc_eth connection_string="10.0.0.10"
litexcnc: Please use the folllowing single command in your hal-file instead:
litexcnc: loadrt litexcnc connections="eth:10.0.0.10"
litexcnc: For more information, see: https://github.com/Peter-van-Tol/LiteX-CNC/issues/32
Users can easily switch to the new standard.
Sorry.
That doesn't work for me....
Terminal keeps sending "errors" and the Z-axis is moving by itself with very low speed even when LinuxCNC is in emergency stop mode
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Running HAL shutdown script
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
task: 543 cycles, min=0.000044, max=0.024437, avg=0.009821, 0 latency excursions (> 10x expected cycle time of 0.010000s)
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
Unexpected read length: -1, expected 88
mb2hal quit_signal DEBUG: signal [15] received
mb2hal quit_cleanup DEBUG: started
mb2hal quit_cleanup DEBUG: unloading HAL module [16] ret[0]
mb2hal quit_cleanup DEBUG: done OK
mb2hal main OK: going to exit!
Unexpected read length: -1, expected 88
litexcnc: LitexCNC driver unloaded
free(): invalid pointer
<commandline>:0: exit value: 255
<commandline>:0: rmmod failed, returned -1
Waited 3 seconds for master. giving up.
Note: Using POSIX realtime
trivkins: not loaded
<commandline>:0: exit value: 255
<commandline>:0: rmmod failed, returned -1
<commandline>:0: unloadrt failed
Note: Using POSIX realtime
Is ssems you've lost communication with the card. Unexpected read length indicates that no data has been received from the FPGA.
Most likely this is due to a malformed connection string. Which means that the error message is wrong... To verify this, can you add your hal-file here?
I had to leave until saturday. Then I can Upload the Hal.
It is the same as in #28
Just deleted the two lines and added
loadrt litexcnc connections="eth:192.168.178.150"
Just tested
loadrt litexcnc connections="eth:10.10.10.10"
Works fine.
@Peter-van-Tol : you will merge?...So that could be the problem... I did not pull the "32" branch 😆
@OJthe123 : This branch only added the error-message. Nothing has changed to the communications and the hal command loadrt litexcnc connections="eth:10.10.10.10"
was already supported long time ago in #11. I guess that there is an issue in your hal-files or athe communications came disrupted in another way.
@OJthe123 : Basically you can use the version now from pypi.org:
pip install litexcnc
If the error persists, please start a Q&A discussion and we will fix your config (or code in that perspective).
Semse.hal.txt
Hi.
Here is my hal...this is the working version.
when I change to connections="eth:...." it doesn't work.
and yes. I comment out the two other loadrt lines when trying
Now when I try to install driver after pulling the latest "11" I get this...
INFO: Compiling LitexCNC driver...
Compiling realtime litexcnc.c
Linking litexcnc.so
sudo cp litexcnc.so /usr/lib/linuxcnc/modules/
[sudo] Passwort für oj:
Compiling realtime litexcnc_eth.c
Linking litexcnc_eth.so
sudo cp litexcnc_eth.so /usr/lib/linuxcnc/modules/
Compiling realtime litexcnc_stepgen.c
In file included from litexcnc_stepgen.c:44:
/tmp/tmppg5hjrun/litexcnc_stepgen.h:153:5: error: unknown type name ‘litexcnc_stepgen_pin_t’
litexcnc_stepgen_pin_t *instances;
^~~~~~~~~~~~~~~~~~~~~~
litexcnc_stepgen.c: In function ‘litexcnc_stepgen_config’:
litexcnc_stepgen.c:144:80: error: ‘stepgen->hal’ is a pointer; did you mean to use ‘->’?
*(stepgen->data.clock_frequency) / (1 << (shift + 1)) > stepgen->hal.param.max_driver_freq) {
^
->
litexcnc_stepgen.c:170:49: warning: initialization of ‘litexcnc_stepgen_instance_t *’ {aka ‘struct <anonymous> *’} from incompatible pointer type ‘int *’ [-Wincompatible-pointer-types]
litexcnc_stepgen_instance_t *instance = &(stepgen->instances[i]);
^
litexcnc_stepgen.c: In function ‘litexcnc_stepgen_prepare_write’:
litexcnc_stepgen.c:260:18: warning: assignment to ‘litexcnc_stepgen_instance_t *’ {aka ‘struct <anonymous> *’} from incompatible pointer type ‘int *’ [-Wincompatible-pointer-types]
instance = &(stepgen->instances[i]);
^
litexcnc_stepgen.c: In function ‘litexcnc_stepgen_process_read’:
litexcnc_stepgen.c:450:18: warning: assignment to ‘litexcnc_stepgen_instance_t *’ {aka ‘struct <anonymous> *’} from incompatible pointer type ‘int *’ [-Wincompatible-pointer-types]
instance = &(stepgen->instances[i]);
^
litexcnc_stepgen.c: In function ‘litexcnc_stepgen_init’:
litexcnc_stepgen.c:586:17: error: ‘stepgen->hal’ is a pointer; did you mean to use ‘->’?
stepgen->hal.param.max_driver_freq = 400e3;
^
->
litexcnc_stepgen.c:590:24: warning: assignment to ‘int *’ from incompatible pointer type ‘litexcnc_stepgen_instance_t *’ {aka ‘struct <anonymous> *’} [-Wincompatible-pointer-types]
stepgen->instances = (litexcnc_stepgen_instance_t *)hal_malloc(stepgen->num_instances * sizeof(litexcnc_stepgen_instance_t));
^
litexcnc_stepgen.c:599:49: warning: initialization of ‘litexcnc_stepgen_instance_t *’ {aka ‘struct <anonymous> *’} from incompatible pointer type ‘int *’ [-Wincompatible-pointer-types]
litexcnc_stepgen_instance_t *instance = &(stepgen->instances[i]);
^
make: *** [/usr/share/linuxcnc/Makefile.modinc:115: litexcnc_stepgen.o] Fehler 1
Error: Compilation of the driver failed.