NordicSemiconductor/Android-nRF-Connect-Device-Manager

Firmware Upgrade "Connection timed out" in 1.4.1 example app

gte1 opened this issue ยท 29 comments

gte1 commented

Using smp_svr example from nRFConnect SDK 2.0.0, with overlay-bt.conf (which should have improved DFU speed)
Android 10 (Huawei P20 pro), nRF52840dk

Expected: Firmware upgrade works
Actual: Firmware upgrade "Connection timed out" in step "UPLOADING..."

p20pro_100822_1.txt

gte1 commented

Huawei P20 LITE, Android 9 works fine, except that we get a timeout in CONFIRM step, but selecting TEST and CONFIRM afterwards activates the new image

Huawei_p20_lite_100822_1.txt

gte1 commented

Samsung Galaxy S9, Android 10, worked fine

Samsung_galaxy_S9_Log_100822_1.txt

That value for "window capacity" are you using?

/**
* Sets window capacity. On Zephyr this is equal to MCUMGR_BUF_COUNT value, which defaults to 4.
* @param windowCapacity number of windows that can be sent in parallel.
* @return The builder.
*/
public Builder setWindowCapacity(final int windowCapacity) {
this.windowCapacity = Math.max(0, windowCapacity);
return this;
}

Also, did you try CONFIRM_ONLY mode?

V	13:40:19.377	[j] Moving from state RESET to state CONFIRM
W	13:40:49.435	[e] Error: (0x85): GATT ERROR
W	13:40:49.450	[e] Connection attempt timed out

How long does your device need to reboot, swap images and start advertising? I see you have set 10 seconds for that. After 10 sec from sending Reset command it will start trying to connect. The device should complete swapping images in 30 seconds from that moment, or you'll get a timeout (error 133, GATT ERROR, 0x85).

gte1 commented

That value for "window capacity" are you using?

/**
* Sets window capacity. On Zephyr this is equal to MCUMGR_BUF_COUNT value, which defaults to 4.
* @param windowCapacity number of windows that can be sent in parallel.
* @return The builder.
*/
public Builder setWindowCapacity(final int windowCapacity) {
this.windowCapacity = Math.max(0, windowCapacity);
return this;
}

I am using the example app from Google Play

gte1 commented
V	13:40:19.377	[j] Moving from state RESET to state CONFIRM
W	13:40:49.435	[e] Error: (0x85): GATT ERROR
W	13:40:49.450	[e] Connection attempt timed out

How long does your device need to reboot, swap images and start advertising? I see you have set 10 seconds for that. After 10 sec from sending Reset command it will start trying to connect. The device should complete swapping images in 30 seconds from that moment, or you'll get a timeout (error 133, GATT ERROR, 0x85).

I have not set those 10 seconds as am using the example app from Google Play, I think that app sets this timeout

and the firmware is https://github.com/nrfconnect/sdk-zephyr/tree/main/samples/subsys/mgmt/mcumgr/smp_svr running on Nordic nRF52840 dk board

I will check if I can measure the firmware swap time

gte1 commented

Also, did you try CONFIRM_ONLY mode?

Similar result
p20pro_confirm_only.txt

gte1 commented
[00:00:01.058,013] <inf> smp_bt_sample: Connected
[00:01:13.907,073] <inf> mcumgr_img_mgmt: Erased 0x42000 bytes of image slot
[00:01:14.019,561] <inf> mcumgr_img_mgmt: Erased 0x1000 bytes of image slot trailer
uart:~$ *** Booting Zephyr OS build v3.0.99-ncs1  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: test
I: Starting swap using move algorithm.
I: Bootloader chainload address offset: 0xc000
*** Booting Zephyr OS build v3.0.99-ncs1  ***


[00:00:00.867,736] <inf> smp_bt_sample: Connected

With a stopwatch, the time from "Starting swap using move algorithm" to next boot message is approx. 26 seconds

Using a phone that works, Samsumg Galaxy S9

But on the Huawei P20 Lite, the app does not connect again after fw reboot:

[00:00:14.317,230] <inf> smp_bt_sample: Connected
[00:00:35.597,137] <inf> mcumgr_img_mgmt: Erased 0x42000 bytes of image slot
[00:00:35.709,625] <inf> mcumgr_img_mgmt: Erased 0x1000 bytes of image slot trailer
uart:~$ *** Booting Zephyr OS build v3.0.99-ncs1  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: test
I: Starting swap using move algorithm.
I: Bootloader chainload address offset: 0xc000
*** Booting Zephyr OS build v3.0.99-ncs1  ***


uart:~$

In the nRF Connect Device Manager app you may modify some settings after expanding the setting area. Use this button:

obraz

Your device uses 4 buffers, so 4 should work in the middle text field. But you may try 3. With 2 or 1 a different uploader is used and the upload may work differently (it will be slower).
For the swap time, you may calculate by doing DFU with Confirm Only mode and counting how long does it take from when upload is complete to when the device starts advertising.

gte1 commented

Adjusting swap time to 30 seconds helped on Huawei P20 Lite
Nothing seems to help on Huawei P20 pro

gte1 commented

A sidenote:
I have seen that on large fw images (> 423KB on my custom board, and custom fw), the upload in basic mode fails on Samsung Galaxy S9 if using 4 buffers (it actually crashes the SMP server in the firmware), but works when using 3 buffers. But in advanced mode, upload from the same phone works also when using 4 buffers.
Strange.....

Only the basic mode is using the new uploader. The advanced one is as if set with 1 buffer, which is much slower.
Buffers allow to send multiple packets before receiving notifications for them, which allows to use more bandwidth:

1 buffer:

send -> notification -> send -> notification -> ...

4 buffers:

send -> send -> send -> send -> notification -> send -> notification -> notification -> send -> send -> ...

If you'd check how many % of the whole upload takes sending the data vs waiting for notifications, second option is h better, thus faster.

gte1 commented

But that does not explain why the SMP server in my firmware crashes using Samsumg Galaxy S9 in basic mode with 4 buffers, but works fine with 3 buffers. Maybe a bug in mcumgr from https://github.com/nrfconnect/sdk-zephyr/tree/main/subsys/mgmt/mcumgr . I use this via https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/libraries/caf/ble_smp.html

Meanwhile, I will try to get logs from both situations

gte1 commented

Attached logs for 3 and 4 buffers using Galaxy S9, basic mode
galaxy_s9_3_buffers.txt
galaxy_s9_4_buffers.txt

gte1 commented

But sometimes 4 buffers work on Galaxy S9 too.... a bit random when 4 buffers work or not

gte1 commented

Another sidenote: nRFConnect for android 4.25.4 seems to work fine with 4 buffers on Samsung Galaxy S9

@philips77 I am still seeing the same issues as @gte1 on a Huawei P20 Pro. Any suggestions?

Another sidenote: nRFConnect for android 4.25.4 seems to work fine with 4 buffers on Samsung Galaxy S9

Mind, that in nRF Connect and nRF Connect Device Manager apps the number of buffers you set is then decremented by 1, so n-1 is passed to the uploader. One buffer is used for sending responses.

Math.max(1, windowCapacity - 1), // 1 buffer is used for sending responses.

Also, I've just released version 1.5.0, but don't think anything related was improved.

gte1 commented

Still timeout errors on Huawei P20 Pro with 1.5.0, but I guess that was expected out of what you wrote above.

Ok, so from your logs I can see that some notifications are not received. The library currently awaits all of them, which should not be required, as any following notification (with offset > x) confirms indirectly receiving a packet with offset x.
Let's see how can I improve it.

OK, I think I know the cause of the issue.

In SMP protocol each packet needs to be acked with a notification. Different packets may take different amount of time to be processed, for example ERASE command takes a long time, so we've set 30 seconds, which is enough for most cases.
The problem is, that for IMAGE UPLOAD command we have 1 second to fail quickly if something isn't going right. This seems to be a problem for the phone you have. It needs more. I manage to replicate the issue with Huawei Mate 20 Pro after decreasing the timeout from 1 second to 100 ms.

Have a look at your logs. This message is printed ~1 second after the upload of frame 7 was started, so it was started around 13:30:22.

W	13:30:23.293	[Uploader] Uploader write failure for chunk with offset=39152: Transaction 7 timed out without receiving a response

However, your logs say, that the notification was in fact received, but at 13:30:26.349:

I	13:30:26.349	[e] Notification received from da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 03-00-00-0D-00-01-07-01-BF-62-72-63-00-63-6F-66-66-19-B5-A0-FF
I	13:30:26.364	[e] Received Header (Op: 3, Flags: 0, Len: 13, Group: 1, Seq: 7, Command: 1) CBOR {"rc":0,"off":46496}

That means, that processing of a single frame took 4.3 seconds, not 1.

There are 2 solutions.

  1. You may increase the timeout from 1 sec to ~5 seconds here:
    val timeout = if (chunk.offset == 0) 30_000L else 1_000L
  2. You may investigate the root cause.

I can see, that the phone has a problem with PHY LE 2M:

W	13:30:18.219	[e] PHY read failed with status 6

A single packet is 2475 bytes long:

I	13:30:18.210	[e] SMP reassembly supported with buffer size: 2475 bytes and count: 4

So with long packet, slow PHY and perhaps Data Length Extension (DLE) disabled, it can take longer time to send it.

I will try our oldest test phone and tailor the timeout, but this could also be adjusted dynamically based on the connection parameters.

I'm releasing 1.5.1 now. Could you give it a spin when it's ready?

gte1 commented

sure :-)

Should be available any minute now. Have a good afternoon.

gte1 commented

Seems to work on HuaweiP20 Pro (but sometimes have to restart transfer), but very slow upload speed on that phone, approx 2 kB/s
Older versions of Device Manager app were much faster (versions that existed before SMP reassembly was implemented)
Screenshot_20221202_104700_no nordicsemi android nrfconnectdevicemanager

Can you share logs from the app? Or perhaps you have a sniffer?
I wonder why is it so slow.

Even on Samsung Galaxy Folder with Android 6.0.1 it reaches 7kB/s and this is the phone that we use for "if it works on this phone it will work everywhere".

  • โœ… Interval 11.25
  • โœ… MTU 498
  • โŒ PHY LE 1M
  • โ” DLE (Data Length Extension)
  • โ” Write Without Request operations
  • โ” Number of packets per connection interval
gte1 commented

Log attached as requested
Log 2022-12-02 14_14_45.txt