swri-robotics/gps_umd

Nothing published on fix topic, no reconnection to gpsd

Closed this issue · 6 comments

rgov commented

I am running gpsd_client on ROS Noetic / Debian 10 "buster". However, NatSavFix messages do not get published. There is nothing in the rosout / stdout of the gpsd_client node.

I can confirm that gpsmon shows up to date GPGGA and GPVTG messages being received. If I manually connect to gpsd's socket, I see:

$ nc localhost 2947
{"class":"VERSION","release":"3.17","rev":"3.17","proto_major":3,"proto_minor":12}
?WATCH={"enable":true,"json":true}
{"class":"DEVICES","devices":[{"class":"DEVICE","path":"udp://192.168.13.255:22335","driver":"NMEA0183","activated":"2022-04-22T20:56:34.300Z","flags":1}]}
{"class":"WATCH","enable":true,"json":true,"nmea":false,"raw":0,"scaled":false,"timing":false,"split24":false,"pps":false}
{"class":"TPV","device":"udp://192.168.13.255:22335","mode":3,"lat":41.524765833,"lon":-70.669887167,"alt":-2.970}
{"class":"TPV","device":"udp://192.168.13.255:22335","mode":3,"lat":41.524765833,"lon":-70.669887167,"alt":-2.970}
{"class":"TPV","device":"udp://192.168.13.255:22335","mode":3,"lat":41.524765833,"lon":-70.669887167,"alt":-2.970}
...

I am launching gpsdclient like so, and rosnode info confirms it (should be) publishing to /gps/fix as desired.

    <node name="gps" pkg="gpsd_client" type="gpsd_client">
        <remap from="/fix" to="~fix" />
        <remap from="/extended_fix" to="~extended_fix" />
    </node>

It looks from the output of lsof that maybe the connection was interrupted, but not automatically re-established.

COMMAND     PID USER   FD      TYPE  DEVICE SIZE/OFF    NODE NAME
gpsd_clie 20886 ifcb   10u     IPv6 1908883      0t0     TCP localhost:36308->localhost:gpsd (CLOSE_WAIT)
gpsd_clie 20886 ifcb   11u     IPv4 1909129      0t0     TCP ifcb152:56593->localhost:41384 (ESTABLISHED)
gpsd_clie 20886 ifcb   12u     IPv4 1983298      0t0     TCP ifcb152:56593->localhost:45266 (ESTABLISHED)
gpsd_clie 20886 ifcb   13u     IPv4 1908903      0t0     TCP ifcb152:56593->localhost:41214 (ESTABLISHED)
gpsd_clie 20886 ifcb   14u     IPv4 1908907      0t0     TCP ifcb152:56593->localhost:41216 (ESTABLISHED)
gpsd_clie 20886 ifcb   15u     IPv4 1908920      0t0     TCP ifcb152:56593->localhost:41232 (ESTABLISHED)
gpsd_clie 20886 ifcb   16u     IPv4 1908510      0t0     TCP ifcb152:56593->localhost:41234 (ESTABLISHED)

Is the node expected to handle this error condition? I am surprised it would not attempt to re-connect or at least log something and/or crash.


Note based on the file descriptor numbers this means that the connection to gpsd died almost immediately, before any of the other nodes were able to subscribe.

From the gpsd logs, it looks like it hit an error trying to open /dev/ttyUSB0 (which is not a GPS receiver). I disabled USBAUTO in my /etc/defaults/gpsd in the hope that this will avoid this error in the future and maybe prevent gpsd_client's connection from failing.

rgov commented

If gpsmm:read() returns NULL the client node just silently ignores it, rather than handling this condition.

(Also looks like there's a leak with not running delete gps; in stop() ?)

rgov commented

Even after disabling USBAUTO, restarting gpsd, restarting gpsd_client, and confirming the client's connection to gpsd is established, there are still no fix messages being published.

rgov commented

I can see with tcpdump that packets are being received by the client, here I've confirmed localhost:47456 corresponds to gpsd_client's connection:

17:37:54.506983 IP6 localhost.gpsd > localhost.47456: Flags [P.], seq 230:345, ack 1, win 512, options [nop,nop,TS val 3523923984 ecr 3523922984], length 115
        0x0000:  6003 2274 0093 0640 0000 0000 0000 0000  `."t...@........
        0x0010:  0000 0000 0000 0001 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0001 0b83 b960 d841 8ec6  ...........`.A..
        0x0030:  9a75 73bc 8018 0200 009b 0000 0101 080a  .us.............
        0x0040:  d20a d010 d20a cc28 7b22 636c 6173 7322  .......({"class"
        0x0050:  3a22 5450 5622 2c22 6465 7669 6365 223a  :"TPV","device":
        0x0060:  2275 6470 3a2f 2f31 3932 2e31 3638 2e31  "udp://192.168.1
        0x0070:  332e 3235 353a 3232 3333 3522 2c22 6d6f  3.255:22335","mo
        0x0080:  6465 223a 332c 226c 6174 223a 3431 2e35  de":3,"lat":41.5
        0x0090:  3234 3735 3130 3030 2c22 6c6f 6e22 3a2d  24751000,"lon":-
        0x00a0:  3730 2e36 3639 3838 3233 3333 2c22 616c  70.669882333,"al
        0x00b0:  7422 3a30 2e34 3030 7d0d 0a              t":0.400}..
rgov commented

The NavSatFix is silently being suppressed because of this:

      /* gpsd reports status=OK even when there is no current fix,
       * as long as there has been a fix previously. Throw out these
       * fake results, which have NaN variance
       */
      if (std::isnan(p->fix.epx) && check_fix_by_variance) {
        return;
      }

This code is checking the "Longitude position uncertainty, meters" and if it is not defined, it will not publish the NavSatFix. To work around it you can explicitly disable check_fix_by_variance in the params.

In my case, my GPS receiver was only outputting GPGGA and GPVTG messages. I enabled GPRMC, GPGSA, and GPGSV, and fix messages started being sent. One of these must provide the necessary information.

Yes, we want to completely fill out the message before we publish it. Depending on what protocol you use, there's going to be different requirements on what provides this. GPSD supports a whole host of protocols, so we can't just specify NMEA 0183 messages. I think we can publish a throttled debug message there to help people out though.

Added a similar fix to ROS2: #60.