NMEA0183 serial input problem: 20231105

Question

NMEA0183 serial input problem: 20231105

norbert-walter opened this issue a year ago · 13 comments

If I introduce a serial data stream via NMEA0183 into an M5Stack Atom via an RS485 unit, problems arise when the data stream is interrupted (cable unplugged, reset on the transmitting device). After the data transfer is interrupted and continued, the telegram counter for the serial port freezes. However, telegrams forwarded via USB are still displayed. The website also no longer builds properly when you refresh the page.

The same problem also occurs when incomplete, garbled or unknown telegrams are transmitted. Then the M5Stack Atom freezes completely. To do this, I intentionally short-circuited the transmission lines. The new GPS receiver sends, for example, the following telegrams immediately after it has been started:

$GPTXT,01,01,02,HW=ATGM336H,00030107486321C
$GPTXT,01,01,02,IC=AT6558-5N-31-0C510800,BMLLCKJ-D2-0374165A
$GPTXT,01,01,02,SW=URANUS5,V5.3.0.01D
$GPTXT,01,01,02,TB=2020-04-28,13:43:1040
$GPTXT,01,01,02,MO=GB77
$GPTXT,01,01,02,BS=SOC_BootLoader,V6.2.0.234
$GPTXT,01,01,02,FI=00856014*71

Answer 1 · 2023-11-11T19:15:50.000Z

A bit hard to understand what exactly you did and what happend.
Let's discuss this directly...

Answer 2 · 2024-01-28T15:52:02.000Z

hi
i have noticed something simular
the nmea2000 connection freezes after connecting a serial input (AIS signal)
did you find a reason for the problem described in the question

Dick Koning

Answer 3 · 2024-01-28T20:40:52.000Z

No, not really.
Maybe you can fetch the log at the USB port.
Optionally increasing the log level.
Did you test this with the newest version?

Answer 4 · 2024-01-29T08:31:51.000Z

Hi I have used the latest version on the website for the homberger board I have used a rather buggy optocoupler setup to connect an AIS signal to the serial port This caused a lot of syntax errors in the nmea sentences (the nmea debug windows in open cpn showed a lot of red sentences) The webinterface of the esp32 crashed, but the device itself kept functioning After improving the optocoupler setup everything works fine, so there must be a problem with the error handling routines You can probably recreate the problem by using an arduio to generate “buggy”” AIS statements I will try and get a log Dick Koning Verzonden vanuit Mail voor Windows Van: Andreas Vogel Verzonden: zondag 28 januari 2024 21:41 Aan: wellenvogel/esp32-nmea2000 CC: tkoning; Comment Onderwerp: Re: [wellenvogel/esp32-nmea2000] NMEA0183 serial input problem:20231105 (Issue #60) No, not really. Maybe you can fetch the log at the USB port. Optionally increasing the log level. Did you test this with the newest version? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

Answer 5 · 2024-03-21T15:36:49.000Z

Any chance for some logs?

Answer 6 · 2024-03-21T17:48:17.000Z

HiI fixed the problem in hardwareFaster optocoupler, less mistakes in NMEA sentences Prevented the crash in softwareDick KoningVerzonden vanaf Samsung-tablet. -------- Oorspronkelijk bericht --------Van: Andreas Vogel ***@***.***> Datum: 21-03-24 16:37 (GMT+01:00) Aan: wellenvogel/esp32-nmea2000 ***@***.***> Cc: tkoning ***@***.***>, Comment ***@***.***> Onderwerp: Re: [wellenvogel/esp32-nmea2000] NMEA0183 serial input problem: 20231105 (Issue #60) Any chance for some logs? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#60 (comment)", "url": "#60 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Answer 7 · 2024-03-21T18:02:05.000Z

Hi Andreas,

I had exactly the same problem when NMEA sentences were transmitted incorrectly. This means that the checksum did not match the content. In my case it was a faulty ground connection that garbled the telegrams. You can take a NMEA0183 log file and subsequently add errors to the recorded telegrams, such as inserting special characters, spaces or deleting parts of the telegram or the checksum. My examples in the first post show such telegrams. For example, they have no checksum or correspond to unknown telegrams. In my opinion, the checksum of the telegram is not checked to see whether the telegram was transmitted completely correctly. Furthermore, it is not checked whether these are known telegrams that the interpreter can translate correctly. This is exactly what causes the software to freeze. Error handling is not working properly. This is not a problem with occasional errors in telegrams. But with many consecutive telegrams.

Norbert

Answer 8 · 2024-03-22T18:12:42.000Z

With some additional analysis I found basically 2 problems:
(1) when the serial rx buffers overrun the nmea messages can be garbled. This can lead to serial counter names that are invalid (e.g. containing a " sign). If this occurs the Web UI looks broken (as the status update internally creates an JS error). The device itself still runs without issues.
So one correction will be to prevent this error in the Web UI.
(2) What causes the RX fifos/ buffers to overflow? This tyipcally will happen if you enable debug output. As every invalid line will at least produce 2 lines of output this easily can cause the system to really slow down. The main loop flushes the log to the USB device. If running with the default of 115200 baud a continuous input with 38400 baud with the example data from this issue will already cause the USB channel to be at it's limit. You can see this in the Main loop line.
If you count the log bytes between 2 Main loop lines and compare this to the time diff you can easily see that this fills up 115200 baud completely.
good (idle):
Main loop 2886.00/s334.82[1542us]#1:30.35[69],2:13.14[21],3:3.44[979],4:3.52[223],5:145.97[299],6:53.15[121],7:32.12[238],8:10.72[24],9:26.04[58],10:12.99[28],11:4.00[16],
bad (38400 baud input with invalid messages):
Main loop 60.76/s14706.50[38544us]#1:13634.41[37054],2:29.25[57],3:3.46[1243],4:3.26[11],5:373.14[717],6:154.02[381],7:118.10[446],8:52.56[100],9:1064.08[1 868],10:22.97[83],11:16.09[4783],
The first line (when empty) shows 2886 main loops/second. The second one only ~60! And you can see that the phase 1(flushing the logs) already takes ~13.6ms (average) of 14.6ms main loop run time.
If you switch the log level to "log" the problem should go away.
In the correction I will add an error log if the RX fifo or the RX buffer are overflowing.

Answer 9 · 2024-03-22T23:13:19.000Z

Ahh... nice that you found a problem.

Answer 10 · 2024-03-23T07:50:28.000Z

HiNice workIs disabling the log also an option ?Dick KoningSent from Android deviceOp 22 mrt. 2024 19:13 schreef Andreas Vogel ***@***.***>: With some additional analysis I found basically 2 problems: (1) when the serial rx buffers overrun the nmea messages can be garbled. This can lead to serial counter names that are invalid (e.g. containing a " sign). If this occurs the Web UI looks broken (as the status update internally creates an JS error). The device itself still runs without issues. So one correction will be to prevent this error in the Web UI. (2) What causes the RX fifos/ buffers to overflow? This tyipcally will happen if you enable debug output. As every invalid line will at least produce 2 lines of output this easily can cause the system to really slow down. The main loop flushes the log to the USB device. If running with the default of 115200 baud a continuous input with 38400 baud with the example data from this issue will already cause the USB channel to be at it's limit. You can see this in the Main loop line. If you count the log bytes between 2 Main loop lines and compare this to the time diff you can easily see that this fills up 115200 baud completely. good (idle): Main loop 2886.00/s334.82[1542us]#1:30.35[69],2:13.14[21],3:3.44[979],4:3.52[223],5:145.97[299],6:53.15[121],7:32.12[238],8:10.72[24],9:26.04[58],10:12.99[28],11:4.00[16], bad (38400 baud input with invalid messages): Main loop 60.76/s14706.50[38544us]#1:13634.41[37054],2:29.25[57],3:3.46[1243],4:3.26[11],5:373.14[717],6:154.02[381],7:118.10[446],8:52.56[100],9:1064.08[1 868],10:22.97[83],11:16.09[4783], The first line (when empty) shows 2886 main loops/second. The second one only ~60! And you can see that the phase 1(flushing the logs) already takes ~13.6ms (average) of 14.6ms main loop run time. If you switch the log level to "log" the problem should go away. In the correction I will add an error log if the RX fifo or the RX buffer are overflowing. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

Answer 11 · 2024-03-23T08:16:12.000Z

Any log level lower then debug should solve the issue

Answer 12 · 2024-03-24T10:30:22.000Z

https://github.com/wellenvogel/esp32-nmea2000/releases/tag/20240324

Answer 13 · 2024-03-24T10:48:52.000Z

Thanks for you work!