NMT BootUp state transition is not noticed
PaulVerhoeckx opened this issue · 1 comments
On initialization the reset communication NMT command is send (node.cpp#L47) after which it will wait until the NMT state BootUp
(=0
) is reached. Occasionally this state is missed, due to the fast succession of the state PreOperational
. Resulting in the following stack trace:
[ INFO] [1622561315.623773805]: Using fixed control period: 0.010000000
[ INFO] [1622561317.481039265]: Initializing...
[ INFO] [1622561317.482258080]: Current state: 1 device error: system:0 internal_error: 0 (OK)
[ INFO] [1622561317.483331782]: Current state: 2 device error: system:0 internal_error: 0 (OK)
[ INFO] [1622561327.493862171]: Current state: 2 device error: system:125 internal_error: 0 (OK)
[ INFO] [1622561327.494414179]: Current state: 2 device error: system:0 internal_error: 0 (OK)
[ INFO] [1622561327.495005929]: Current state: 0 device error: system:0 internal_error: 0 (OK)
[ INFO] [1622561327.495577834]: Current state: 0 device error: system:0 internal_error: 0 (OK)
[ERROR] [1622561327.496872067]: CAN not ready
[ERROR] [1622561327.497535175]: Initializing failed: could not reset node '1'
Using print statements I found the problem. A switch to NMT state 0
was made here, causing wait_until() to return. However, before the while
condition was evaluated (here), state_
had already transitioned to PreOperational
. This prevents breaking the while
loop, again triggering wait_until(), which now times out.
As a workaround I made reset_com()
wait on the state PreOperational
, but this does not guarantee a reset of the node. If someone knows a good solution, I am willing to make a PR.
Using print statements I found the problem. [...]
Add print statement changes the timing, I am not sure if this is the root cause.
Resulting in the following stack trace:
[ERROR] [1622561327.496872067]: CAN not ready
[ERROR] [1622561327.497535175]: Initializing failed: could not reset node '1'
It looks like your CAN bus stops working in between. (?)
As a workaround I made reset_com() wait on the state PreOperational
If I recall it correctly not all nodes send PreOperational (it depends on the heartbeat interval?)
That's why the state get's set to PreOperational right after the BootUp message without further checks.
I think this might not happen with our devices, because the heartbeat interval is 0 per default and gets activated after the reset.
If someone knows a good solution, I am willing to make a PR.
I would add a special variable to track the BootUp message and check this instead of state_
in a wait_for_bootup
function.