Firmware Upgrade "Connection timed out" in 1.4.1 example app
gte1 opened this issue ยท 29 comments
Using smp_svr example from nRFConnect SDK 2.0.0, with overlay-bt.conf (which should have improved DFU speed)
Android 10 (Huawei P20 pro), nRF52840dk
Expected: Firmware upgrade works
Actual: Firmware upgrade "Connection timed out" in step "UPLOADING..."
Huawei P20 LITE, Android 9 works fine, except that we get a timeout in CONFIRM step, but selecting TEST and CONFIRM afterwards activates the new image
Samsung Galaxy S9, Android 10, worked fine
That value for "window capacity" are you using?
Also, did you try CONFIRM_ONLY mode?
V 13:40:19.377 [j] Moving from state RESET to state CONFIRM
W 13:40:49.435 [e] Error: (0x85): GATT ERROR
W 13:40:49.450 [e] Connection attempt timed out
How long does your device need to reboot, swap images and start advertising? I see you have set 10 seconds for that. After 10 sec from sending Reset
command it will start trying to connect. The device should complete swapping images in 30 seconds from that moment, or you'll get a timeout (error 133, GATT ERROR, 0x85).
That value for "window capacity" are you using?
I am using the example app from Google Play
V 13:40:19.377 [j] Moving from state RESET to state CONFIRM W 13:40:49.435 [e] Error: (0x85): GATT ERROR W 13:40:49.450 [e] Connection attempt timed out
How long does your device need to reboot, swap images and start advertising? I see you have set 10 seconds for that. After 10 sec from sending
Reset
command it will start trying to connect. The device should complete swapping images in 30 seconds from that moment, or you'll get a timeout (error 133, GATT ERROR, 0x85).
I have not set those 10 seconds as am using the example app from Google Play, I think that app sets this timeout
and the firmware is https://github.com/nrfconnect/sdk-zephyr/tree/main/samples/subsys/mgmt/mcumgr/smp_svr running on Nordic nRF52840 dk board
I will check if I can measure the firmware swap time
Also, did you try CONFIRM_ONLY mode?
Similar result
p20pro_confirm_only.txt
[00:00:01.058,013] <inf> smp_bt_sample: Connected
[00:01:13.907,073] <inf> mcumgr_img_mgmt: Erased 0x42000 bytes of image slot
[00:01:14.019,561] <inf> mcumgr_img_mgmt: Erased 0x1000 bytes of image slot trailer
uart:~$ *** Booting Zephyr OS build v3.0.99-ncs1 ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: test
I: Starting swap using move algorithm.
I: Bootloader chainload address offset: 0xc000
*** Booting Zephyr OS build v3.0.99-ncs1 ***
[00:00:00.867,736] <inf> smp_bt_sample: Connected
With a stopwatch, the time from "Starting swap using move algorithm" to next boot message is approx. 26 seconds
Using a phone that works, Samsumg Galaxy S9
But on the Huawei P20 Lite, the app does not connect again after fw reboot:
[00:00:14.317,230] <inf> smp_bt_sample: Connected
[00:00:35.597,137] <inf> mcumgr_img_mgmt: Erased 0x42000 bytes of image slot
[00:00:35.709,625] <inf> mcumgr_img_mgmt: Erased 0x1000 bytes of image slot trailer
uart:~$ *** Booting Zephyr OS build v3.0.99-ncs1 ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: test
I: Starting swap using move algorithm.
I: Bootloader chainload address offset: 0xc000
*** Booting Zephyr OS build v3.0.99-ncs1 ***
uart:~$
In the nRF Connect Device Manager app you may modify some settings after expanding the setting area. Use this button:
Your device uses 4 buffers, so 4 should work in the middle text field. But you may try 3. With 2 or 1 a different uploader is used and the upload may work differently (it will be slower).
For the swap time, you may calculate by doing DFU with Confirm Only
mode and counting how long does it take from when upload is complete to when the device starts advertising.
Adjusting swap time to 30 seconds helped on Huawei P20 Lite
Nothing seems to help on Huawei P20 pro
A sidenote:
I have seen that on large fw images (> 423KB on my custom board, and custom fw), the upload in basic mode fails on Samsung Galaxy S9 if using 4 buffers (it actually crashes the SMP server in the firmware), but works when using 3 buffers. But in advanced mode, upload from the same phone works also when using 4 buffers.
Strange.....
Only the basic mode is using the new uploader. The advanced one is as if set with 1 buffer, which is much slower.
Buffers allow to send multiple packets before receiving notifications for them, which allows to use more bandwidth:
1 buffer:
send -> notification -> send -> notification -> ...
4 buffers:
send -> send -> send -> send -> notification -> send -> notification -> notification -> send -> send -> ...
If you'd check how many % of the whole upload takes sending the data vs waiting for notifications, second option is h better, thus faster.
But that does not explain why the SMP server in my firmware crashes using Samsumg Galaxy S9 in basic mode with 4 buffers, but works fine with 3 buffers. Maybe a bug in mcumgr from https://github.com/nrfconnect/sdk-zephyr/tree/main/subsys/mgmt/mcumgr . I use this via https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/libraries/caf/ble_smp.html
Meanwhile, I will try to get logs from both situations
Attached logs for 3 and 4 buffers using Galaxy S9, basic mode
galaxy_s9_3_buffers.txt
galaxy_s9_4_buffers.txt
But sometimes 4 buffers work on Galaxy S9 too.... a bit random when 4 buffers work or not
Another sidenote: nRFConnect for android 4.25.4 seems to work fine with 4 buffers on Samsung Galaxy S9
@philips77 I am still seeing the same issues as @gte1 on a Huawei P20 Pro. Any suggestions?
Another sidenote: nRFConnect for android 4.25.4 seems to work fine with 4 buffers on Samsung Galaxy S9
Mind, that in nRF Connect and nRF Connect Device Manager apps the number of buffers you set is then decremented by 1, so n-1 is passed to the uploader. One buffer is used for sending responses.
Also, I've just released version 1.5.0, but don't think anything related was improved.
Still timeout errors on Huawei P20 Pro with 1.5.0, but I guess that was expected out of what you wrote above.
Ok, so from your logs I can see that some notifications are not received. The library currently awaits all of them, which should not be required, as any following notification (with offset > x) confirms indirectly receiving a packet with offset x.
Let's see how can I improve it.
OK, I think I know the cause of the issue.
In SMP protocol each packet needs to be acked with a notification. Different packets may take different amount of time to be processed, for example ERASE command takes a long time, so we've set 30 seconds, which is enough for most cases.
The problem is, that for IMAGE UPLOAD command we have 1 second to fail quickly if something isn't going right. This seems to be a problem for the phone you have. It needs more. I manage to replicate the issue with Huawei Mate 20 Pro after decreasing the timeout from 1 second to 100 ms.
Have a look at your logs. This message is printed ~1 second after the upload of frame 7 was started, so it was started around 13:30:22.
W 13:30:23.293 [Uploader] Uploader write failure for chunk with offset=39152: Transaction 7 timed out without receiving a response
However, your logs say, that the notification was in fact received, but at 13:30:26.349:
I 13:30:26.349 [e] Notification received from da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 03-00-00-0D-00-01-07-01-BF-62-72-63-00-63-6F-66-66-19-B5-A0-FF
I 13:30:26.364 [e] Received Header (Op: 3, Flags: 0, Len: 13, Group: 1, Seq: 7, Command: 1) CBOR {"rc":0,"off":46496}
That means, that processing of a single frame took 4.3 seconds, not 1.
There are 2 solutions.
- You may increase the timeout from 1 sec to ~5 seconds here:
- You may investigate the root cause.
I can see, that the phone has a problem with PHY LE 2M:
W 13:30:18.219 [e] PHY read failed with status 6
A single packet is 2475 bytes long:
I 13:30:18.210 [e] SMP reassembly supported with buffer size: 2475 bytes and count: 4
So with long packet, slow PHY and perhaps Data Length Extension (DLE) disabled, it can take longer time to send it.
I will try our oldest test phone and tailor the timeout, but this could also be adjusted dynamically based on the connection parameters.
I'm releasing 1.5.1 now. Could you give it a spin when it's ready?
sure :-)
Should be available any minute now. Have a good afternoon.
Can you share logs from the app? Or perhaps you have a sniffer?
I wonder why is it so slow.
Even on Samsung Galaxy Folder with Android 6.0.1 it reaches 7kB/s and this is the phone that we use for "if it works on this phone it will work everywhere".
- โ Interval 11.25
- โ MTU 498
- โ PHY LE 1M
- โ DLE (Data Length Extension)
- โ Write Without Request operations
- โ Number of packets per connection interval
Log attached as requested
Log 2022-12-02 14_14_45.txt