Restarting FC input hangs cmavnode
Closed this issue · 16 comments
If I reboot the flight controller, cmavnode seems to recognise the link down/up, but then cpu goes up to 100% and trying to quit the interactive shell hangs.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6424 mav 20 0 51096 4604 3808 R 100.3 0.5 2:25.69 cmavnode
]$ ../bin/cmavnode -f ./fc.conf -v -i
07:21:41,854 --> Found 2 links
07:21:41,855 --> Valid Serial Link: fcserial Found at: /dev/ttyAMA0, baud: 115200
07:21:41,857 --> Valid UDP Link: udpin1 Found at :0 -> 14570
07:21:41,858 --> Command line arguments parsed succesfully.
07:21:41,858 --> Links Initialized, routing loop starting.
07:21:41,860 --> Adding sysID: 1 to the mapping on link: fcserial
CMAV >
CMAV > 07:22:04,158 --> sysID: 1 timed out after 10.002 s.
07:22:04,158 --> Removing sysID: 1 from the mapping on link: fcserial
07:22:06,027 --> Adding sysID: 1 to the mapping on link: fcserial
CMAV > quit
I can't replicate this with SITL, so if anything i guess it might be a bug in the serial handling.
What hardware are you using? Computer/Flightcontroller/usb uart?
It's a pixracer (px4-v4) running ardupilot/arducopter v3.5-dev connecting from telemetry port to raspberry pi gpio serial pins, no rtc/cts.
Ok, flow control should default off so that should be fine.
I dont have any hardware with me but ill test on sunday with arducopter 3.5 + FTDI + oDroid XU4 and try to replicate this.
OK so same problem without interactive shell. It stops routing mavlink when cpu goes to 100%. Triggering sigint does respond but doesn't shut down, maybe that's a clue?:
2017-02-24 08:20:46,981 INFO [default] Adding sysID: 1 to the mapping on link: fcserial
^CExit code 2
2017-02-24 08:21:50,314 INFO [default] SIGINT caught, deconstructing links and exiting
^CExit code 2
2017-02-24 08:21:54,237 INFO [default] SIGINT caught, deconstructing links and exiting
^CExit code 2
2017-02-24 08:22:00,586 INFO [default] SIGINT caught, deconstructing links and exiting
^CExit code 2
2017-02-24 08:22:00,749 INFO [default] SIGINT caught, deconstructing links and exiting
Here's a zip of it happening under strace -ff, there's a clear difference in behaviour after the link goes down, hope that helps a bit.
strace.cmavnode.zip
Yes i see, you could try the branch obeycmakeprefix which i just pushed a slight serial change to. Also could you try with UDP on the same computer to see if its definatly a serial issue?
I will test on sunday.
Thanks for help
Tried obeycmakeprefix branch, same problem.
Ah so I've tried it with mavproxy in the middle, so ttyAMA0 -> mavproxy (udpin@14570) -> (udplink@127.0.0.1:14570) cmavnode (udpserver@14574).
Restarting the FC, the first time I get a segfault:
Found 2 links
Valid UDP Link: udpfc Found at 127.0.0.1:14570 -> 0
Valid UDP Link: udpin0 Found at :0 -> 14574
Command line arguments parsed succesfully.
Links Initialized, routing loop starting.
CMAV > Adding sysID: 252 to the mapping on link: udpin0
Adding sysID: 1 to the mapping on link: udpfc
sysID: 1 timed out after 10.001 s.
Removing sysID: 1 from the mapping on link: udpfc
Adding sysID: 1 to the mapping on link: udpfc
Segmentation fault
The second time the cpu went briefly to 100% then this:
Found 2 links
Valid UDP Link: udpfc Found at 127.0.0.1:14570 -> 0
Valid UDP Link: udpin0 Found at :0 -> 14574
Command line arguments parsed succesfully.
Links Initialized, routing loop starting.
CMAV > Adding sysID: 252 to the mapping on link: udpin0
Adding sysID: 1 to the mapping on link: udpfc
sysID: 1 timed out after 10.007 s.
Removing sysID: 1 from the mapping on link: udpfc
sysID: 72 timed out after 497869 s.
Removing sysID: 72 from the mapping on link: udpfc
Adding sysID: 1 to the mapping on link: udpfc
terminate called after throwing an instance of 'Exception'
what(): The incoming message queue is full
Aborted
This is using the obeycmakeprefix branch.
Ok i managed to replicate this today.
What baud rate are you using for the flight controller -> companion computer link?
Looks like its an issue with the queues filling up when using high baud rates, I saw the error using 921600.
Solution will be:
-
Don't abort the program when a queue fills up.
-
Stop the queue filling up, probably by modifying the this_thread::sleep time in the main loop. It is a balance between using CPU by polling the queues frequently and not filling the queues up.
I'll push a branch for you to test with some fixes and I will test as well.
I just pushed to the branch "lowerlooptime" which should fix the problem at the expense of some cpu usage, once confirmed that fixes the problem we can try to find a better balance between cpu usage/memory usage/not dropping packets.
We can drop less packets by reducing loop time and increasing queue size at the cost of cpu and memory.
Now that exceptions are not thrown on full queue an overflow wont crash the program but it will drop the packet that overflows.
Baudrate from FC to mavproxy is 115200. Tried lowerlooptime branch still using mavproxy in the middle, so cmavnode connecting to udp. Restart the FC, cmavnode goes to 100% cpu still and I get the console flooded with the message:
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Ok sorry i guess its more complicated than i thought...
Ideas @olwyk ?
Will try other things tommorow thanks for testing
When this issue happens msgs keep being received on all links, so the read threads are still running. However no messages are being sent despire 100% cpu usage.
There is a bug in the main loop causing it to go crazy in certain situations and only on low powered hardware.
I have added in and out queue counters to see whats going on better.
Confirmed working now, well done :)