medic/cht-gateway

Remove wifi resetting feature as it doesn't work as expected

Closed this issue · 23 comments

Currently there is some "clever" code in WakefulService which monkeys with the wifi settings. This may actually be making things worse than better. To confirm this suspicion, add a settings option so that the behaviour can be disabled at the discretion of TLs.

cc @mukesh2006

@ashish-medic or @binokaryg do you know if this is still impacting the Nepal deployment? Can you comment on the severity of this issue?

Yes, this is still impacting Asia's SMS project (Standard and MoH Project). We have been using v1.5.x version (https://github.com/medic/medic-gateway/releases/tag/v1.5.1 ) of the gateway which is much more stable than v1.7.x. Severity wise I would say it's medium.

This is blocking upgrading to later versions. Prioritising for 3.11.0.

Moving to AT, PR: #161
This branch is based on the branch that solves issue: #142
Please review and get #142 merged first before AT this.

Summary

When sending a SMS and this action throws an exception (SocketTimeoutException, UnknownHostException, ConnectException or NoRouteToHostException), Gateway was disabling the WiFi and resending the SMS by using Mobile Data, after the job is done it will activate the WiFi again automatically.

The issue is if this SMS keeps trowing an exception because of the WiFi or something else and there are more SMS to send. Then Gateway will keep disabling the WiFi and resending indefinitely, eventually this could crash the app.

The fix:

  1. Prevent requesting again to Webapp, if there are more SMS or status updates to send and the previous request failed. It will wait for the WakeService (interval recurrent job of sending SMS) to pick the SMS and status updates and retry sending again.

  2. Add a configuration option to turn on/off the autoenabling/disabling of WiFi.

From the documentation:

Automatically enable WiFi

Gateway can receive exceptions when making requests to the Webapp endpoint like: socket timeout, unknown host or connection exception. When this happens Gateway will disable the WiFi and try to resend the request by using Mobile Data, once the job is done it will activate the WiFi again automatically.
Check this box to enable this behaviour.

@latin-panda got any thoughts on forcing the exception? Running through docker with ngrok. I've shutdown a bunch of services while in a breakpoint to try and force it. However, whatever ngrok is returning is showing as a TextResponse when coming back.

@newtewt this is a bit tricky to test, what I do is: Disconnect the internet cable from my router but leaving the router on, so the phone is still connected to the WiFi but not internet access. It's better if you have API services running somewhere else with internet so API can be reached with Mobile Data.

If the Gateway's setting Automatically enable Wifi is checked then after a while you will see the UnknownHostException and Gateway will try to disconnect WiFi, then attempt to send the sms with Mobile Data and finally enable WiFi again.

You might see some UnknownHostException without WiFi disable/enable, that's because they are from previous individual request to API (immediate sms status change) that aren't part of 'the scheduled recurrent job' where the app tries to communicate with API services (checks if there are sms to send or status to notify). This feature is implemented in the recurrent job only.

If the Gateway's setting Automatically enable Wifi is not checked then after a while you will see the UnknownHostException but not attempt of disable/enable wifi, just polling and fail, polling and fail (scheduled recurrent job).

Hey team,
Please, let me know If I can help for UAT.

@latin-panda I'm struggling to accomplish this. Because as soon as I disable the connection android detects that there is no internet over wifi and switches to LTE automatically for me. Which is basically what our code is attempting to do. @ashish-medic it would be great if you could attempt this as well. I can provide you an apk unless you know how to build it yourself. Also, might be beneficial for you to try out the other changes in this release too.

@newtewt if possible can you share the Apk, I haven't set up the environment yet.

Sent @ashish-medic the unbranded apk through slack.

Hey @latin-panda @ashish-medic , I know we had talked about testing this issue around a current known issue of the phone going to sleep. Are there any updates to that? Thanks!

Hi @newtewt
Apparently the Gateway gets stand by in background when the phone isn't used for a while, that's the last thing reported by @ashish-medic, sadly this is an pre-existencing issue in previous versions, not related with the current ticket.
I think the issue about the app getting on stand by can be explored in #147
Unless it's considered high priority and if there's band width in 3.11, then maybe can be add this to the release (?)
Have you guys find any issue related with this ticket? Can I merge? :)
cc: @garethbowen @craig-landry

Sounds good. These issues are being prioritised to unblock the Nepal team (our heaviest user of medic-gateway) from using the latest versions. One step at a time :D Since #147 is identified as our next step, I've increased its priority for consideration in release cycles.

Hi @newtewt just wondering if this ticket is still in testing, any concerns about the feature? :) tnx!

@latin-panda I have not put any effort towards this ticket. I was under the impression that we were getting it tested by @ashish-medic since I could not reproduce the issue locally or fix locally with my devices auto switching through android. I'll bring this up at our product call to go over how we want to move forward.

I'm struggling to accomplish this. Because as soon as I disable the connection android detects that there is no internet over wifi and switches to LTE automatically for me.

@ashish-medic Is out of office until end of December

Hey all - keeping folks in sync with Slack. @newtewt was unable to test with his hardware, so I'm stepping into help out. I just got setup with a working SIM card and am diving into the test steps and this APK @latin-panda built!

I discussed with @latin-panda but never got around to updating this issue. She's following up, but we're in agreement that we should just delete this "clever" code. It doesn't appear to be solving the problem (hence the request to disable it), and it doesn't work on latest Android anyway.

Moving this back to In Progress awaiting a new PR.

Ready for review in 135-remove-wifi-auto-disabling

NB: This version removes the "clever" code altogether. Test everything works when there's a good internet connection. Test that it gracefully handles a bad connection, and recovers and clears the queue when the connection is restored.

I tested this by sending messages to the phone with Wi-Fi off. Wi-Fi is the only way to connect to my localhost. gateway put the messages into a waiting status. Re-enable Wi-Fi and they eventually sent. I also disabled the network on the server hosting my local instance. The messages were put into a waiting status and eventually sent when the connection was re-established. Using android 11 on my pixel 2. I think we can close this and would be good to get a team to use it in the field to validate it works on at least 1 project before moving forward with mass deploy. @latin-panda feel free to merge.

PR was merged!