MonashUAS/cmavnode

Restarting FC input hangs cmavnode

Closed this issue · 16 comments

fnoop commented

If I reboot the flight controller, cmavnode seems to recognise the link down/up, but then cpu goes up to 100% and trying to quit the interactive shell hangs.

fnoop commented
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 6424 mav       20   0   51096   4604   3808 R 100.3  0.5   2:25.69 cmavnode
]$ ../bin/cmavnode -f ./fc.conf -v -i
07:21:41,854 --> Found 2 links
07:21:41,855 --> Valid Serial Link: fcserial Found at: /dev/ttyAMA0, baud: 115200
07:21:41,857 --> Valid UDP Link: udpin1 Found at :0 -> 14570
07:21:41,858 --> Command line arguments parsed succesfully.
07:21:41,858 --> Links Initialized, routing loop starting.
07:21:41,860 --> Adding sysID: 1 to the mapping on link: fcserial
CMAV >
CMAV > 07:22:04,158 --> sysID: 1 timed out after 10.002 s.
07:22:04,158 --> Removing sysID: 1 from the mapping on link: fcserial
07:22:06,027 --> Adding sysID: 1 to the mapping on link: fcserial

CMAV > quit
JMare commented

I can't replicate this with SITL, so if anything i guess it might be a bug in the serial handling.

What hardware are you using? Computer/Flightcontroller/usb uart?

fnoop commented

It's a pixracer (px4-v4) running ardupilot/arducopter v3.5-dev connecting from telemetry port to raspberry pi gpio serial pins, no rtc/cts.

JMare commented

Ok, flow control should default off so that should be fine.

I dont have any hardware with me but ill test on sunday with arducopter 3.5 + FTDI + oDroid XU4 and try to replicate this.

fnoop commented

OK so same problem without interactive shell. It stops routing mavlink when cpu goes to 100%. Triggering sigint does respond but doesn't shut down, maybe that's a clue?:

2017-02-24 08:20:46,981 INFO  [default] Adding sysID: 1 to the mapping on link: fcserial
^CExit code 2
2017-02-24 08:21:50,314 INFO  [default] SIGINT caught, deconstructing links and exiting
^CExit code 2
2017-02-24 08:21:54,237 INFO  [default] SIGINT caught, deconstructing links and exiting
^CExit code 2
2017-02-24 08:22:00,586 INFO  [default] SIGINT caught, deconstructing links and exiting
^CExit code 2
2017-02-24 08:22:00,749 INFO  [default] SIGINT caught, deconstructing links and exiting
fnoop commented

Here's a zip of it happening under strace -ff, there's a clear difference in behaviour after the link goes down, hope that helps a bit.
strace.cmavnode.zip

JMare commented

Yes i see, you could try the branch obeycmakeprefix which i just pushed a slight serial change to. Also could you try with UDP on the same computer to see if its definatly a serial issue?

I will test on sunday.
Thanks for help

fnoop commented

Tried obeycmakeprefix branch, same problem.

fnoop commented

Ah so I've tried it with mavproxy in the middle, so ttyAMA0 -> mavproxy (udpin@14570) -> (udplink@127.0.0.1:14570) cmavnode (udpserver@14574).

Restarting the FC, the first time I get a segfault:

Found 2 links
Valid UDP Link: udpfc Found at 127.0.0.1:14570 -> 0
Valid UDP Link: udpin0 Found at :0 -> 14574
Command line arguments parsed succesfully.
Links Initialized, routing loop starting.
CMAV > Adding sysID: 252 to the mapping on link: udpin0
Adding sysID: 1 to the mapping on link: udpfc
sysID: 1 timed out after 10.001 s.
Removing sysID: 1 from the mapping on link: udpfc
Adding sysID: 1 to the mapping on link: udpfc
Segmentation fault

The second time the cpu went briefly to 100% then this:

Found 2 links
Valid UDP Link: udpfc Found at 127.0.0.1:14570 -> 0
Valid UDP Link: udpin0 Found at :0 -> 14574
Command line arguments parsed succesfully.
Links Initialized, routing loop starting.
CMAV > Adding sysID: 252 to the mapping on link: udpin0
Adding sysID: 1 to the mapping on link: udpfc
sysID: 1 timed out after 10.007 s.
Removing sysID: 1 from the mapping on link: udpfc
sysID: 72 timed out after 497869 s.
Removing sysID: 72 from the mapping on link: udpfc
Adding sysID: 1 to the mapping on link: udpfc
terminate called after throwing an instance of 'Exception'
  what():  The incoming message queue is full
Aborted

This is using the obeycmakeprefix branch.

JMare commented

Ok i managed to replicate this today.

What baud rate are you using for the flight controller -> companion computer link?

Looks like its an issue with the queues filling up when using high baud rates, I saw the error using 921600.

Solution will be:

  1. Don't abort the program when a queue fills up.

  2. Stop the queue filling up, probably by modifying the this_thread::sleep time in the main loop. It is a balance between using CPU by polling the queues frequently and not filling the queues up.

I'll push a branch for you to test with some fixes and I will test as well.

JMare commented

I just pushed to the branch "lowerlooptime" which should fix the problem at the expense of some cpu usage, once confirmed that fixes the problem we can try to find a better balance between cpu usage/memory usage/not dropping packets.

We can drop less packets by reducing loop time and increasing queue size at the cost of cpu and memory.

Now that exceptions are not thrown on full queue an overflow wont crash the program but it will drop the packet that overflows.

fnoop commented

Baudrate from FC to mavproxy is 115200. Tried lowerlooptime branch still using mavproxy in the middle, so cmavnode connecting to udp. Restart the FC, cmavnode goes to 100% cpu still and I get the console flooded with the message:

Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
Failed to add msg to incoming queue: Queue Full
JMare commented

Ok sorry i guess its more complicated than i thought...

Ideas @olwyk ?

Will try other things tommorow thanks for testing

JMare commented

When this issue happens msgs keep being received on all links, so the read threads are still running. However no messages are being sent despire 100% cpu usage.

There is a bug in the main loop causing it to go crazy in certain situations and only on low powered hardware.

I have added in and out queue counters to see whats going on better.

JMare commented

Ok @fnoop we found the problem and fixed it.

Master should work now!

fnoop commented

Confirmed working now, well done :)