helium/gateway-rs

Downlink lost betweem gateway-rs and the packet forwarder

Oliv4945 opened this issue · 9 comments

Hello,

I have an issue where 40% of the joinAccepts are lost between gateway_rs and the Semtech packet forwarder:

Sep 05 14:00:49 grog helium_gateway[2645778]: mac existed, but IP updated: 30:37:34:32:48:00:6C:00, 127.0.0.1:60747, module: gateway
Sep 05 14:00:49 grog helium_gateway[2645778]: uplink @900809022 us, 868.30 MHz, Ok(DataRate(SF12, BW125)), snr: 10, rssi: -57, len: 23 from 30:37:34:32:48:00:6C:00, module: gateway
Sep 05 14:00:49 grog helium_gateway[2645778]: mac existed, but IP updated: 30:37:34:32:48:00:6C:00, 127.0.0.1:49548, module: gateway
Sep 05 14:00:51 grog helium_gateway[2645778]: rx1 downlink @905809022 us, 868.30 MHz, DataRate(SF12, BW125), len: 33 via 30:37:34:32:48:00:6C:00, module: gateway

gateway-rs logs: The joinReq is received, then a joinAccept is scheduled to be sent on RX1 5 seconds later. But in the packet forwarder logs I see only the RX, no TX requested:

JSON up: {"rxpk":[{"tmst":900809022,"chan":1,"rfch":1,"freq":868.300000,"stat":1,"modu":"LORA","datr":"SF12BW125","codr":"4/5","lsnr":10.0,"rssi":-57,"size":23,"data":"ACcRERERERERJxERERERERFfbyLrUJ8="}]}
INFO: [up] PUSH_ACK received in 0 ms

I am using helium_gateway 1.0.0-alpha.31 on a x86 computer, connected to Semtech Pico Gateway with this packet forwarder.

Hello, any update on this?

Hello, any update on this?

My apologies for missing this.. We can't really support every custom configuration people try out here. Your best option is as in the #gateway-development channel on Discord to see if someone else in the community has seen the same issue

Is there any more context you have? I’m having a hard time figuring out how to debug a packager forwarder gateway-rs bug.

would we be able to get access to such a system to help debug the interaction?

Hello,

Unfortunately the two systems where I saw this are on corporate networks so I can not grant you SSH access:

My setup: gateway-rs and the packet forwarder are on the same computer.
My customer setup: Milesight gateway with the packet forwarder -> VPN -> gateway-rs. Unfortunately we do not have access to Milesight's full log but the joinAccept downlink is never displayed in the summary

What kind of info would you need to get more context?

Hi @Oliv4945

What's strange to me is that you have this line:

Sep 05 14:00:51 grog helium_gateway[2645778]: rx1 downlink @905809022 us, 868.30 MHz, DataRate(SF12, BW125), len: 33 via 30:37:34:32:48:00:6C:00, module: gateway

But nothing appears on the packet forwarder! I think there's basically two potential issues:

  1. perhaps the VPN and/or networking configuration is dropping the UDP traffic.
  2. perhaps the PULL_RESP (ie: the one with the transmit request) is incompatible with the packet forwarder on the Milesight? I think that's unlikely given that is sometimes works at all.

Is there a way we can either:

  1. run gateway-rs directly on the Milesight?
  2. run gateway-rs and Milesight on the same LAN?

I think either of those configurations would help us eliminate or confirm the first potential issue.

Also, just to confirm 100%, you only have one packet forwarder connected to the gateway, correct? gateway-rs does not tolerate more than one packet forwarder and will elect to downlink through the most recent "connection" (determined via the PULL_DATA frame).

@lthiery there are actually two systems where I saw this behavior:

My lab computer
This is where the traces from the first post are recorded. gateway-rs and pkt_forwarder are run on that computer, no VPN/network are involved. I was surprised, as you, that the packet is shown in the gateway-rs logs and not in the pkt_forwarder logs, this is why I initially opened the issue

Customer issue
I recently saw this in a customer setup where the issue rate is really higher, close to 100%. Unfortunately on the Milesight gateways we do not have direct access to the logs, nor the possibility to run gateway-rs on the gateway. This issue appears on a system they already deployed and they do not have physical access, but I will ask them if they have spare gateway to do a test on the same network. I will also ask them to confirm that only one gateway is connected to gateway-rs

@Oliv4945 For the situation with the logs, it looks like something funny is happening:

Sep 05 14:00:49 grog helium_gateway[2645778]: mac existed, but IP updated: 30:37:34:32:48:00:6C:00, 127.0.0.1:60747, module: gateway
...
Sep 05 14:00:49 grog helium_gateway[2645778]: mac existed, but IP updated: 30:37:34:32:48:00:6C:00, 127.0.0.1:49548, module: gateway

This indicates to me that gateway-rs received two PULL_DATA frames at about the same time, but from two different ports. The timing of the whole thing is interesting as (1) the first PULL_DATA is received from 60747, (2) a packet is received, and (3) a second PULL_DATA is received within the same second 14:00:49.

Any idea what happened? Is it possible that there are two instances or that the packet forwarder rebooted? Do you see more connections like this happening in the logs?

For the customer issue, failure of 100% of the downlink makes me think we are giving the Milesight what it considers to be invalid requests or frames. It may be due to tx_power, frequency, or perhaps even the UDP protocol. If we can get logs from a test unit running the same packet forwarder, I would hope either the Milesight packet forwarder or gateway-rs will spit out some useful errors to work from.

What's the latest on the @lthiery and @Oliv4945 ?

Closing this issue as I am not able to reproduce.