petzval/btferret

High traffic BLE write_ctic

Closed this issue · 42 comments

I have a scenario of high traffic data going from a device/BLE/peripheral/GATT_server to a Linux_PC/BLE/central/GATT_receiver, both using btferret. The device sender uses write_ctic, the PC receiver uses notification on the same characteristic. High traffic means packet every 10ms.

How can I know that the device sender can write new data to write_ctic? Right now, I need to put a large usleep before writing to write_ctic otherwise btferret goes wild and the hci kernel stack is screwed up, requiring a reboot. The PC receiver listening to the notification is working fine whatever the library (btferret, bluez, webBluetooth) is used.

(Note1: the same scenario works with bluez on both sides though it's not optimal and it has its own set of problems. But at least, it shows that the scenario is possible from a BLE hardware point of view.
Note2: the same scenario works even better using Bluetooth Classic with or without btferret. That's not surprising taking into account the bandwidth of Classic vs. BLE. But I need to make work my scenario with BLE.)

Write_ctic has a fixed 25 ms delay that allows enough time for the command to complete. This has been chosen as a safe value, but it will interfere with your 10ms rate. The command is at line 4441

if((cp->perm & 8) == 0 || notflag != 0)
  {
  readhci(0,0,0,25,0)

As a temporary fix, try reducing this number. But at some point, the rate of commands will crash the Bluetooth buffer. I'll think about a better solution for the next release that only delays as long as necessary.

If the characteristic is write with response, then the delay is in a different instruction at line 4447

readhci(ndevice,0,IN_ATTDAT,gpar.timout,gpar.toshort);

reduce gpar.toshort parameter, for example:

readhci(ndevice,0,IN_ATTDAT,gpar.timout,10);

Thank you but I don't think it's accurate. This code is never executed on LE server during a write_ctic.

write_ctic on LE server does lines 4264, 4286 and return at 4290 (for a notification).

line 4264: VPRINT "Send %s notification to %s\n",cp->name,dp->name);
cmd[9] = 0x1B;
line 4286: sendhci(lenotify,n);         
line 4290: return(locsize);

Lines 4440+ are never reached.

I'm going to post an example.

On Bluez, I see that the notification on server side takes around 5ms for a packet of 150 bytes. Bluez has a destructor callback once the notification has been processed and sent so I can really measure what Bluez user-space is doing. In other words, with Buez, I can send a notification every 5 ms.

On btferret, I need to put a hundreds of milliseconds sleep and I have a crash or the kernel BLE stack. There is probably something wrong.

I have multiples client receivers (Android, PC bluez, PC webbluetooth) and I don't think that the problem resides on client receiver side especially because it's a notification (not an indication - no acknowledgement is sent back).

On server side, I do:

while (1) {
	usleep(1 * 1000);
	write_ctic(localnode(), 0, sz, 0);
}

Everything is working if I have:
lechar=test permit=10 size=21 handle=5
Log: https://pastebin.com/EmiLf4cb
I receive notification on the other side around every 5ms. This is consistent with what Bluez is doing.

It's NOT working if I have:
lechar=test permit=10 size=22 handle=5
Log: https://pastebin.com/bWXgMhVB

And I get kernel errors:
Bluetooth: hci0: Frame reassembly failed (-84)

Any idea what is wrong?

I can manage to make work this same scenario with Bluez (exact same kernel obviously). Also, another value for usleep doesn't fix the problem unless it's a huge number (like every second).

That's helpful, I can now see the various problems with this rate of writes. I'll release a new version within a week with an attempted fix and an "any devices" option for classic servers.

Great! Thank you very much. Here is my test case: https://gentil.com/tmp/test2.zip

New version 9 uploaded. Characteristics and notifications now transmit in 2ms. Option to specify primary service UUIDs in the devices file. ANY_DEVICE option for connection to a classic_server().

I just noticed a minor bug and there is now a revised version 9 uploaded.

I still have some trouble with this issue. Can you please try to run the following simple example: https://gentil.com/tmp/test4.zip

On the device, I can even run ./test-server 0 (0 for 0ms wait). It's working. I get 6-byte packet every 0.6ms. I receive data on a Linux Ubuntu PC.

But if I change test-server.txt and put size to 240 for the characteristic, it's only working with a very long wait (./test-server 1000 i.e. 1000ms). As soon as I do a small wait of 10 or 20ms (./test-server 20), the device kernel reports "Bluetooth: hci0: Frame reassembly failed (-84)". And I need to reboot the device.

Can you reproduce and do you have any idea what could be wrong?

The maximum allowed characteristic size may be 244, but at this level the Bluetooth adapter splits the data into two separate packets, and this is slowing the transfer and causing the problem. On my Pis the maximum characteristic size that will transmit a single packet is 235. With a size of 235, I am getting transfer times of around 3ms and your example runs. The maximum packet size may be hardware dependent. You can check it by setting verbose print mode on the receiver, send a single characteristic, and see if it arrives as one or two packets.

As a separate issue, I think I've found the reason for the crash, which is a bug. The system should recover gracefully even if it cannot cope with the transmit rate. I'll upload a revised version 9 in the next few days with a fix.

Yes, what you are saying makes some sense. My device is not a Raspberri Pi so I have my own kernel.

BUT, I can make it work with Bluez: I mean that the same device can communicate high traffic using the same kernel and a custom Bluez stack to the same Android receiver app.

I'm going to post some btmon log files.

Here are log files https://gentil.com/tmp/log1.zip comparing working bluez and problematic btferret (same everything else).

It appears from the logs that your btferret device does not allow the data length to be changed. Here's the procedure on my Pis and then in the btferret-notworking log.

My Pi 

When btferret starts - read commands allowed by Bluetooth adapter:
< HCI Command: Read Local Supported Commands (0x04|0x0002) plen 0
> HCI Event: Command Complete (0x0e) plen 68
      Read Local Supported Commands (0x04|0x0002) ncmd 1
        Status: Success (0x00)
     List includes:
        .....
        LE Set Data Length (Octet 33 - Bit 6)    -  (Want to see this)

On connection, the data length is set to the default minimum 1B (27).
Use Set Data Length to increase it:
Set larger data length - request maximum allowed FB (251)
  Set [4][5] handle 41 00
< HCI OGF=08 OCF=22
      0000  01 22 20 06 41 00 FB 00 - 00 08
       
> Event 0E = 01 22 20 00 41 00
      0000  04 0E 06 01 22 20 00 41 - 00  (Reply status=0 OK)    

> Event 3E = 07 41 00 F2 00 00 08 F8 00 00...
      0000  04 3E 0B 07 41 00 F2 00 - 00 08 F8 00 00 08 
      Reply from controller allowing data length = F2 (242)
      Subtract 7 for header gives max size of data = 235
          
Cpig data length change T F2 R F8  (Verbose mode prints this - want to see T F2)

But in the notworking log, the Set Data Length command is not allowed by the Bluetooth adapter.

Your btferret-notworking log:
Read supported commands at line 101:

< HCI Command: Read Local Supported Commands (0x04|0x0002) plen 0
> HCI Event: Command Complete (0x0e) plen 68
      Read Local Supported Commands (0x04|0x0002) ncmd 1
        Status: Success (0x00)
        Commands: 169 entries
          Inquiry (Octet 0 - Bit 0)
          Inquiry Cancel (Octet 0 - Bit 1)
          Periodic Inquiry Mode (Octet 0 - Bit 2)
          ...
          LE Set Data Length (Octet 33 - Bit 6) - *** DOES NOT APPEAR ***         
 
 
After connection at line 341:
Try to increase data length - but it fails as expected from the supported commands result:         
< HCI Command: LE Set Data Length (0x08|0x0022) plen 6
        Handle: 64
        TX octets: 251
        TX time: 2048
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Data Length (0x08|0x0022) ncmd 1
        Status: Unknown HCI Command (0x01) - FAILED - COMMAND NOT SUPPORTED
        
  So set data length fails and it remains 1B (27)

  I would expect verbose mode to print:
  
XXXXX data length change T 1B R ...   (transmit data length still 1B)

So any size above 20 will be split into multiple packets. This is a property of the Bluetooth adapter itself - not the kernel. I have tried three Pis - 4/400/3B and they all allow Set Data Length.
In the bluez log, it looks like it makes no attempt to increase the data length and splits transmissions into multiple packets with the default minimum length of 27.
The crash is caused by slow multi-packet transmissions overloading the system. The revised version will detect when this happens and so avoid a crash - but the transmission failure will still be there.

My chip is bcm43455 which is BLE4.1. LE Data Length Extension has been introduced in BLE4.2. So in the btferret cfg file, should I put a size of 20 or 27? Is there a way to know that the adapter/kernel is being overloaded and it's time to slow down transmission?

For the default data length use SIZE=20. There is an undocumented method that may allow the sending device to see when the transmit stack is clear - but I cannot be sure that it will work, so use at your own risk. See COMMAND STACK POINTER at line 5111 in btlib.c.

I still have some trouble (a lot of time spent on this). I now do:

		int cs = cmd_stack_ptr();
		write_ctic(localnode(), UUID_STREAM - 0xfff1, aa, 162);
		while(cmd_stack_ptr() >= cs + 1)
			usleep(1000);

Why 162? Because 162%27 = 0 and 162<184 which is the MTU of my BLE chip. SIZE of characteristic is 162. What is critical is that 162 < 184. Perhaps I should use 162 - 7 = 155. Whatever.

The system is not crashing but while loop takes 250ms to finish. So there is a 250ms between each packet: from 37.728s to 37.978s.

< ACL Data TX: Handle 64 flags 0x00 dlen 169                                                    [hci0] 37.480576
      ATT: Handle Value Notification (0x1b) len 164
        Handle: 0x000d
          Data: 0e63558bd685fc055891d2b31c106910876fa42d4ca43414296a696a434d3400edd485a9b4940c52d484d21a43400bba909a4a4a005269b9a29280173499a4a298c33466928a0028a292800a28a2800a28a2800a28a43400fc52a8a76077f5a70001ebde9088c0e280b919a9718047bd222f51ef4011e314e58cbb0555258f000ef4e3d7f4a7c2086041c11dc53021c6319a502b62f6c848ab703acb8cff00bd597b
> HCI Event: Number of Completed Packets (0x13) plen 5                                          [hci0] 37.728205
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x00 dlen 169                                                    [hci0] 37.728561
      ATT: Handle Value Notification (0x1b) len 164
        Handle: 0x000d
          Data: 0f79fad20b8d03fc2a45a1467f1a785ee2802584e08adeb06cc78ac28856b583e081412cd2a28a299647346244208ac4b9b72ac78adfaaf730091738e6825aea73c57149b6aecd0ed278a80ad215c80ad30ad59d94d29486562b48454e5698568022c5348a98ad34ad3191629315215a4228023c52629f8a08a008f1453b1498a0636929f8a4c500368a5c518a60368a5c52e2801b453b1462801314869c29ad4016
> HCI Event: Number of Completed Packets (0x13) plen 5                                          [hci0] 37.978404
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x00 dlen 169                                                    [hci0] 37.980575

Bluez takes 1ms instead. You read the log file of Bluez which is:

      ATT: Handle Value Notification (0x1b) len 157
        Handle: 0x0011
          Data: 03688a7fc2378027049c6ecfbc51fd9e9e60c3b7df18f928c5837cfbd85242780f36f19b1e0802678757b061a7ddd1b053e850ddee5221f02f00ad04770637ed8537d7f768815a00841163c613705d02a1236573f07de2ec90c5817cc0f5d12c10423b634aa2fbd9868b41e23ea07e9315cfeb6cbf258e049e08e865625e0a0674a3a304ca7022ecb8b080e3c95570a13dc607c84b363a32a7237a
> HCI Event: Number of Completed Packets (0x13) plen 5                                          [hci0] 29.467862
        Num handles: 1
        Handle: 64
        Count: 2
< ACL Data TX: Handle 64 flags 0x00 dlen 27                                                     [hci0] 29.468071
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                                     [hci0] 29.468088
> HCI Event: Number of Completed Packets (0x13) plen 5                                          [hci0] 29.469209
        Num handles: 1
        Handle: 64
        Count: 2
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                                     [hci0] 29.469269
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                                     [hci0] 29.469286
> HCI Event: Number of Completed Packets (0x13) plen 5                                          [hci0] 29.470557
        Num handles: 1
        Handle: 64
        Count: 2
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                                     [hci0] 29.470601
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                                     [hci0] 29.470614
      ATT: Handle Value Notification (0x1b) len 157
        Handle: 0x0011
          Data: 04f576765e1f808b7a326da96c66024887d3b4ad2ff0919fb8c12d7397baa4dafeac8c211c080fb606f78c3aeaa4c1e98e92940ce49eaf6a5050f71a33ad14254a42b8b80cde19501c08f17752975f57825a18ae865896fd04192dbda49ba01ec57256f7dd6b5cdbb4164838b02c02f82df012374604af01fd8532feec1b0010d0000ad0a1fb6360bfc158ee20009cf824752d536720c4931ba429
> HCI Event: Number of Completed Packets (0x13) plen 5                                          [hci0] 29.471866

Don't bother about 157 vs. 162. That doesn't matter. Two different examples.

The point is that Bluez makes the kernel split the packets into 27 chunks while btferret does NOT.

I checked the code, bluez writes to the kernel the WHOLE 157-byte packet.

On this line, it does:
https://github.com/Vudentz/BlueZ/blob/master/src/shared/io-mainloop.c#L296
iovcnt == 1 and ret == 157.

Also Bluez has a callback when the transfer is finish. The cmd_stack_ptr loop is an equivalent of the bluez callback to signal the end of the transfer.

How can I modify btferret to replicate what bluez is doing?

Try this. Replace leopen[] at line 507 with the following. This replaces old values 0x18 and 0x28 with 0x06.

unsigned char leopen[40] = {29,0,S2_BADD,0,
  1,0x0D,0x20,0x19,0x60,0,0x60,0,0,0,0x7C,0x17,0x2D,0xC0,0x1E,0,0,0x06,0,0x06,0,0,0,0x11,0x01,0,0,0,0}; // len 29

I now get transmit times of 2-3ms for single packets.
With btferret, the packet is split into 27-byte chunks by the Bluetooth adapter if the data length has not been changed.

I don't think that it's the case. I have modified btlib.c:

--- a/btlib.c
+++ b/btlib.c
@@ -505,7 +505,7 @@ unsigned char lesetscan[16] = { 11,0,0,0,0x01,0x0B,0x20,0x07,0x01,0x10,0x00,0x10
 unsigned char locbadd[10] = {4,0,0,0,0x01,0x09,0x10,0};
 unsigned char lesuggest[16] = { 8,0,0,0,1,0x24,0x20,4,0xF8,0,TRANSMITUS & 0xFF,(TRANSMITUS >> 8) & 0xFF};
 unsigned char leopen[40] = {29,0,S2_BADD,0,
-  1,0x0D,0x20,0x19,0x60,0,0x60,0,0,0,0x7C,0x17,0x2D,0xC0,0x1E,0,0,0x18,0,0x28,0,0,0,0x11,0x01,0,0,0,0}; // len 29
+  1,0x0D,0x20,0x19,0x60,0,0x60,0,0,0,0x7C,0x17,0x2D,0xC0,0x1E,0,0,0x06,0,0x06,0,0,0,0x11,0x01,0,0,0,0}; // len 29
 unsigned char leupdate[40] = {18,0,0,0,
   1,0x13,0x20,0x0E,0,0,0x18,0,0x28,0,0,0,0x11,0x01,0,0,0,0}; // len 18
 unsigned char lerrf[20] = {6,0,0,0,1,0x16,0x20,0x02,0,0}; // len 6

and I'm still getting:

< ACL Data TX: Handle 64 flags 0x00 dlen 169                                                      [hci0] 21.229807
      ATT: Handle Value Notification (0x1b) len 164
        Handle: 0x000d
          Data: 0d28016928a28012929d4d34005145140051451400519a292800cd252d250028a09e6928a601494b4500069052d25003a9452678a514805a5a4a5a0028a28a006d14a453680168a4a5a0028a28a005a29296800a28a280168a4a280168a28a0028a28a005a28c52d002629314ea4a006e28a75211400da28a2800a28a4a0028a28a004a28a2980514514000a4a28a00514a28a2900ea28a280168a28a00434945140
> HCI Event: Number of Completed Packets (0x13) plen 5                                            [hci0] 21.476989
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x00 dlen 74                                                       [hci0] 21.477798

which still takes 250ms and is still different from Bluez.

What else am I missing?

I think that we are testing on 2 different hardware, yours is > 4.2 mine is 4.1. So data_length is not handled the same way.

I get your 250ms time, but this is caused by the stack pointer wait. Using >=cs+1 will not work because the system does not acknowledge every command. I get the fastest transmits with the following:

  while(cmd_stack_ptr() >= cs+5)
    ;

With two Pis connecting, and the client leopen intervals = 0x06, and data length setting disabled to match your system, I get the following timings:

Send one SIZE=20 byte packet = 1.5ms
Send 8x SIZE=20 packets (so total 160 bytes) = 12ms 
Send one SIZE=160 packet (adapter splits into smaller packets) = 22ms 

So the conclusion is that the fastest method is to split the data into 20-byte packets yourself.

It's still not working.

First, if I do packet of 20-byte with bluez, the performance is less less good.

Back to btferret:
If I do SIZE=20, with while(cmd_stack_ptr() >= cs+5) ;, the transmission is doing some kind of stop-and-go. With some wait of 50 ~ 100ms every 5 ~ 10 packets.

If I do SIZE=160 and leopen intervals = 0x06, the log is:

> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 19.004790
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x00 dlen 167                   [hci0] 19.006426
      ATT: Handle Value Notification (0x1b) len 162
        Handle: 0x000d
          Data: 0cf2550762401b8644b21070ffdebe7b158609aeceeb4f3fdff44c7873c021f2e0a5447ffb0f740b97f1c07e9b7fbcbe8372e4da5693fe7f8120a2144fc46f37fffa37bdf7beb37fffa5777777bdfbbdb37fffa13defbdef37fffa77b9f1fef769f8530b02bf6fff88c5718a1615ac460ebc88d8388c693137fffd5fbeefdfd62159c46b0a288e7d17ff0a29e7fa17ffcd3e89fa5139c301c4267b43792d58f0
> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 19.254756
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x00 dlen 167                   [hci0] 19.258427
      ATT: Handle Value Notification (0x1b) len 162
        Handle: 0x000d
          Data: 0d0244e012205c1779f98d8469e014c8104d29cfe0152a9b4ba477fd57ea2cff6bbddfdfde0ec4e34d39f5dd991bd56db56417e454148c47fdcc50880807246a958ffee64a82a01333d7171c83dff78148b8d35b4fe4cfed066882be9d7cd7fd29f7fbf6f8ac57be57aed77fdf07025425f52797c9e83eebf8d870788c79ac42ac6625c6c46f1189723312905b11bc4625c88de2778854a2146dc137fffc1056
> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 19.504600
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x00 dlen 167                   [hci0] 19.506416
      ATT: Handle Value Notification (0x1b) len 162
        Handle: 0x000d
          Data: 0eb88c1aa65118bc2989e7fff78530ec7dbffcb32ce6ffff6f9a831e08594d1e7df76449297ac7a53a270602b8473a9264a50294a7acca6de9f0db3c3985654acbfef5a95c613285e648472b2fc27ae6a35aff7c63f77fb0d080dd97098180ac1702c36c9821c5a8062bd361f8758329fb643feb2ae7a343667db645abefcd49e8a149c13f6001721a3ddef8e8df5a528d1e5e46277fedbf69d6ffb320aa6495
> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 19.755157
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x00 dlen 167                   [hci0] 19.758419
      ATT: Handle Value Notification (0x1b) len 162
        Handle: 0x000d

It's still different from the Bluez working log at https://gentil.com/tmp/log1.zip.

I still hope to replicate with btferret what Bluez is doing because for my BCM chip, it's the best I have seen so far.

The init log with btferret in case of 160-byte is https://pastebin.com/gG7Dus14. What is the difference with Bluez?

In the btferret log, the client sends Connection update commands to change the connection interval, so it cannot be a btferret device, which sets the interval to 7.5ms on connection (if 0x06 values in leopen) and sends no updates. It looks like an Android device that sets the interval to 45ms on connection (line 15) then to 7.5ms to read the services (line 53), and then back to 45ms at line 231. Simple solution - program the client to request 7.5ms interval. Otherwise it MAY be possible to stop the reset to 45ms by sending a parameter request negative reply:

At line 6581  gotflag = IN_PARAMREQ
The requested intervals are at insdat[n+3] to insdat[n+6]
If they are 06 00 06 00 then the client is requesting 7.5ms and paramreply can be sent as programmed.
But otherwise (if the interval has previously been set to 7.5ms) send paramnegreply as follows:

sendhci(paramnegreply,devicen);

This may stop the update and leave the interval at 7.5ms, but I'm only guessing.

Here's another possibly simpler suggestion. In gotflag == IN_PARAMREQ the requested intervals are copied to the reply - so the server just agrees to the client's request. Remove the copy code for loop, and the reply will only agree to a 7.5ms interval. So the code at line 6586 will now be:

else if(gotflag == IN_PARAMREQ)
  {
  VPRINT .....
  sendhci(paramreply,devicen);
  }

There is a little bit of progress but still not perfect.

with:

  • SIZE=20
  • modification line 6586 (even without modification leopen line 507)
  • while(cmd_stack_ptr() >= cs + 5) usleep(1000);

I think that I have something similar between bluez and btferret.

But the packet size is too small. It's not manageable on both ends.

With:

  • SIZE=160
  • modification line 6586 (with or without modification leopen line 507)
  • cs = cmd_stack_ptr(); write(...; while(cmd_stack_ptr() > cs) usleep(1000); the stack doesn't crash but it's super slow.

I start to feel bad to ask: is there something special in bluez log that makes the stack work with SIZE>20 and that btferret doesn't do yet.

The important first step is to get the connection interval to 7.5ms. If the client is a btferret device, it uses leopen to set the interval on connection, and the 0x06 values will do the job. If the client is some other device, it must be persuaded to set this interval because the server (e.g. a btferret device) has no control over it. Android does not allow detailed control of the interval - the closest available function is requestConnectionPriority() with CONNECTION_PRIORITY_HIGH. The only btferret server IN_PARAMREQ code that MAY stop the client from resetting the interval from 7.5ms is:

    else if(gotflag == IN_PARAMREQ)
      {
      VPRINT "Connection parameter change request\n");
      if(insdat[n+3] == 0x06)
        {
        for(j = 0 ; j < 8 ; ++j)
          {
          paramreply[PAKHEADSIZE+j+6] = insdat[n+j+3];   // any values below those requested will fail
          }
        sendhci(paramreply,devicen);
        }
      else
        sendhci(paramnegreply,devicen);                
      }

As I suspected, use of cmd_stack_ptr() is troublesome, so I would suggest just using a delay - but you need to know the single 20-byte packet transmit time. Bluez does seem to split data into 20-byte packets before sending to the adapter. I tried this in the past and found that it was no quicker than sending a large packet and letting the adapter do the splitting. The best method is to do the splitting yourself as follows. If the connection interval is 7.5ms and the allowed data length is 20 bytes between two btferret Pi4 devices, this transfers 120 bytes every 10ms and runs in 50 seconds without overloading the stack. Increasing the number of writes to 7 with 140 bytes overloads the stack, which is consistent with a single packet transmit time of 1.5ms.

for(n = 0 ; n < 5000 ; ++n)
  {
    // put 120 bytes into unsigned char buf[120]
    // write to 6 characteristics - each SIZE=20 bytes
  write_ctic(localnode(),0,buf,20);
  write_ctic(localnode(),1,buf+20,20);
  write_ctic(localnode(),2,buf+40,20);
  write_ctic(localnode(),3,buf+60,20);
  write_ctic(localnode(),4,buf+80,20);
  write_ctic(localnode(),5,buf+100,20);
    // delay 10ms for 6 writes to clear stack (which should take 9ms)
  usleep(1e4);
  }

I have read what you are saying. I understand it. Here is more information:

  • my scenario is between a device BLE 4.1 (which is very bad but I can't change that) and an Android/iOS app which I control.

  • In my Android code based on the blessed library, I have discovered:

peripheral.requestConnectionPriority(ConnectionPriority.HIGH);
peripheral.requestMtu(184);
  • As I mentioned to you, with Bluez stack on the device, the user-space Bluez stacks writes to the kernel the whole 160-byte packet. The user-space bluez does NOT split packets.

  • You are the BLE expert. I learnt a lot from this page:
    https://docs.silabs.com/bluetooth/4.0/general/system-and-performance/throughput-with-bluetooth-low-energy-technology
    My understanding is that the connection interval (between 7.5ms and 4s) is the time between sending data. I agree with you that for sure it needs to stay at 7.5ms. It also says: "Multiple packets can be sent within one connection interval sequentially (with 150 us inter-frame spacing)". This is what I want to achieve with multiple 20-byte packets to form the maximum transfer size.

  • I also think that we need to control MTU which can be as high as 250 bytes if one of the two actors is BLE4.1.

  • I still don't understand how the Bluez user-space stack gets the callback when the 160-byte frame has been sent. My Bluez code uses this file: https://github.com/bluez/bluez/blob/master/src/shared/io-mainloop.c. I mentioned to you that the write operation on line 296 writes 160 bytes at once. this is also what we see in btmon log.

  • I feel bad to say this again ;-) but I think that I need to find the way to use btferret with SIZE=160 because splitting in my app to 20-byte packets makes it too slow. If you want to have multiple 20-byte packets within 1 connection interval, it needs to be part of the same gatt / frame operation of SIZE=160. Am I wrong about this?

Here's the procedure for an Android client and a btferret server.

Android connects with interval 45ms

Android requests MTU 184
btferret replies that it will accept any MTU up to 247
Android uses MTU 184

btferret requests data length of single packet = 244
btferret controller rejects request because command is not supported
maximum data size per packet remains = 20

Android requests interval 7.5ms
btferret accepts
Android informs btferret that it will use 7.5ms
Android reads btferret's services
Android requests interval 45ms
btferret accepts (it can only agree to the requested range of values)
Android informs btferret that it will use 45ms

***** PROPOSED setinterval() CODE ADDITION - see below **** 
btferret requests interval = 7.5ms
Android accepts ???
**********************************************

btferret sends notification via write_ctic with SIZE=20
btferret controller sends as single packet

btferret sends notification via write_ctic with SIZE=160
btferret controller splits into smaller packets for transmission
Android re-assembles the packets into a single 160-byte value

   Note - bluez does the same, as shown by these sequences in bluez-working log:
   < ACL Data TX: Handle 64 flags 0x00 dlen 27  flag=00 means initial packet with length 27 (20 bytes of data)
   < ACL Data TX: Handle 64 flags 0x01 dlen 27  flag=01 means additional data packet
   < ACL Data TX: Handle 64 flags 0x01 dlen 27
   < ACL Data TX: Handle 64 flags 0x01 dlen 27
   <.....


For continuous transmission, data can be sent every 10ms in two ways -
as a number of 20-byte writes, or one longer write.
The command stack pointer can be used as a neat way of
telling if the stack is being overloaded and hence finding
the system capacity:

1. As a number of 20-byte writes:

printf("START CSP=%d\n",cmd_stack_ptr());
for(n = 0 ; n < 5000 ; ++n)
  {
  // send a number of SIZE=20 characteristics 
  write_ctic(.....,20);
  write_ctic(.....,20); 
  ...
  usleep(1e4);  // 10ms delay
  }
printf("END CSP=%d\n",cmd_stack_ptr());

Between two Pis with interval=7.5ms and data length not changed, I can send
6 writes per loop and the END CSP is START CSP + 6 or less. So the stack is cleared on 
every loop and the transmission is good. With 7 writes, the 
END CSP is START CSP + 256 indicating that the stack is overloaded and will
eventually fail.
This server code always runs in 50s, but the client only
receives 5000 values in 50s if the stack is not overloaded.


2. As a single large SIZE write:

printf("START CSP=%d\n",cmd_stack_ptr());
for(n = 0 ; n < 5000 ; ++n)
  {
  // send one SIZE=150 characteristic 
  write_ctic(.....,150);
  usleep(1e4);  // 10ms delay
  }
printf("END CSP=%d\n",cmd_stack_ptr());

The END CSP is START CSP + 6 or less, so OK. With SIZE=160, the 
END CSP is START CSP + 896 indicating that the stack is overloaded.
So the system capacity is 6x 20-byte writes, or one 150-byte write.
If the data length can be changed, a 160-byte packet is sent in 3ms.

Proposed code change as indicated above if the client will not request 7.5ms interval. Usually, the client dictates the connection interval, but there is a method by which the server can request a change.

  // if the IN_PARAMREQ code at line 6581 has been changed - put it back to the original
  // so btferret always accepts the client's request

  // Add this code to your program
  // send once, after client has read the services and before first write_ctic()  notifucation
  setinterval(node of client);
  
  // Add this to btlib.c
void setinterval(int node)
  {
  int ndevice;
  static unsigned char leconnup[32] = {21,0,S2_HAND,0,2,0x40,0,0x10,0,0x0C,0,0x05,0,0x12,0x03,0x08,0x00,0x06,0x00,0x06,0x00,0,0,0xF4,0x01 };

  ndevice = devn(node);
  if(ndevice < 0)
    return;
    // request interval = 7.5ms    
  sendhci(leconnup,ndevice);
  }  


I get this result using nRF connect client on an Android tablet:

btferret calls setinterval() asking for 7.5ms
Android replies that it will accept 11.25ms
btferret accepts 11.25ms
Android informs btferret that it will use 11.25ms

I have added the setinterval. It helps, I understand and I see how Android reacts but it's still not working. The problem is:

On Bluez:

< ACL Data TX: Handle 64 flags 0x00 dlen 27                                             [hci0] 109.849466
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849489
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849496
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849503
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849516
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849522
< ACL Data TX: Handle 64 flags 0x01 dlen 26                                             [hci0] 109.849528
      ATT: Handle Value Notification (0x1b) len 183
        Handle: 0x0011
          Data: 000000000167424014a681419f9d00000...
< ACL Data TX: Handle 64 flags 0x00 dlen 27                                             [hci0] 109.853580
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.853600
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.853607
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.853613
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.853620
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.853625
> HCI Event: Number of Completed Packets (0x13) plen 5                                  [hci0] 109.856749
        Num handles: 1
        Handle: 64
        Count: 2
< ACL Data TX: Handle 64 flags 0x01 dlen 26                                             [hci0] 109.856788
      ATT: Handle Value Notification (0x1b) len 183
        Handle: 0x0011
          Data: 01104002ada5c90d7bc957f4fffaaeb82...
< ACL Data TX: Handle 64 flags 0x00 dlen 27                                             [hci0] 109.857438

On btferret:

< ACL Data TX: Handle 64 flags 0x00 dlen 188                                                    [hci0] 20.475562
      ATT: Handle Value Notification (0x1b) len 183
        Handle: 0x000d
          Data: 000000000167424014...
< ACL Data TX: Handle 64 flags 0x00 dlen 188                                                    [hci0] 20.727536
      ATT: Handle Value Notification (0x1b) len 183
        Handle: 0x000d
          Data: 0175d75d75d75fffff23dd75d75d75d75...
> HCI Event: Number of Completed Packets (0x13) plen

As long as we don't manage to understand why my BCM splits the frame into 27, 27, 27, 27, 27, 27, 26 packets with Bluez, while my BCM is not splitting with btferret.

packet 181 bytes => ATT transfer 181+2=183 bytes => 188 data TX. On Bluez, it's split on 6x27+26 data TX while on btferret it stays as a single 188 bytes.

To make it crystal clear:

btferret:

< ACL Data TX: Handle 64 flags 0x00 dlen 188                                            [hci0] 20.475562

which screws my BCM chip.

bluez:

< ACL Data TX: Handle 64 flags 0x00 dlen 27                                             [hci0] 109.849466
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849489
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849496
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849503
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849516
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849522
< ACL Data TX: Handle 64 flags 0x01 dlen 26                                             [hci0] 109.849528

I believe that you don't manage to reproduce the behavior of my BCM 4.1 chip even when you tell the Raspberry Pi NOT to support "LE Set Data Length".

Something is missing in the BCM configuration by btferret to ask the adapter to split the frame. Something that Bluez does. And for the records, Bluez writes the whole frame into the kernel. Bluez doesn't do any splitting in user-space.

Last, I'm still investigating SIZE>20 because I believe that the performance of SIZE=20 x (6 or 7 or 8 times) will never equal the performance of SIZE=181. Even on Bluez, there is a substantial difference between multiple times 20-bytes vs 181 bytes.

Full log: https://gentil.com/tmp/log2.zip

Both systems split the packets. Some time ago I tried splitting in the btferret code, but it was no faster than letting the controller do it. It looks like bluez may be splitting the packets itself and for your system this is quicker. I've dug out the old code that splits the packets first. Implement as follows:

Add this function:

int splitcmd(unsigned char *s)
  {
  int n,k,sn,tn,tn0,t20,len,numx,remx,kx;
  unsigned char t[512];
  
  len = s[5] + (s[6] << 8) - 3;
  if(len <= 20)
    return(0);  
    
  for(n = 0 ; n < 32 ; ++n)
    t[n] = s[n];
    
  t20 = t[2];
  t[2] |= 0x20;
  tn = len+12;  
  t[3] = 0x1B;
  t[4] = 0;
  numx = (len-20)/27;
  remx = (len-20)%27;
  if(remx > 0)
    ++numx;
  sn = 32;
  tn = 32;
  for(n = 0 ; n < numx ; ++n)
    {
    ++gpar.cmdcount;
    tn0 = tn;
    t[tn] = 0x02;
    t[tn+1] = t[1];
    t[tn+2] = t20 | 0x10;
    if(n == numx-1 && remx > 0)
      {
      t[tn+3] = remx;
      kx = remx;
      }
    else
      {
      t[tn+3] = 0x1B;
      kx = 27;
      }
    t[tn+4] = 0;
    tn += 5;

    for(k = 0 ; k < kx ; ++k)
      {
      t[tn] = s[sn];
      ++tn;
      ++sn;
      }
    } 
  write(gpar.hci,t,tn);
  ++gpar.cmdcount;
  return(1);
  }



In btlib.c  sendhci() function
Line number
of current
code
5081    hexdump(cmd,len);
5082    }  // end printflag
    
  // Insert this code
  if(s == lenotify)
    {
    if(splitcmd(s+PAKHEADSIZE) != 0)
      return(1);
    } 
  // end of insert
  
5084  ntogo = len;  // first header entry is length of cmd
5085  timstart = timems(TIM_LOCK);


Unless I have made a mistake, the code is not working well:

> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 13.336234
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x02 dlen 27                    [hci0] 13.346095
        invalid packet size (218 != 27)
        b8 00 04 00 1b 0d 00 00 00 00 00 01 67 42 40 14  ............gB@.
        a6 81 41 9f 9d 00 00 00 01 68 ce 02 40 10 1b 00  ..A......h..@...
        38 80 00 00 00 01 65 88 82 0a ff ff c3 d1 40 00  8.....e.......@.
        40 2f da c8 b2 76 b2 77 df 27 26 02 40 10 1b 00  @/...v.w.'&.@...
        a4 59 38 43 5f 93 42 7d 1c ff fe 10 5f 1d 2e 3f  .Y8C_.B}...._..?
        ff e1 0d 7d 3f fe 10 d2 f1 b4 9f 02 40 10 1b 00  ...}?.......@...
        ff 84 30 48 b8 0e 0b ef fb 84 8e 6c 38 68 ff f9  ..0H.......l8h..
        db f5 74 92 eb 20 f9 9e cb 9e db 02 40 10 1b 00  ..t.. ......@...
        bf ff ff 04 f7 7e f8 85 b5 77 d7 5d 77 d7 5d 75  .....~...w.]w.]u
        df 7d 2d 72 75 9b 86 30 06 66 b6 02 40 10 1b 00  .}-ru..0.f..@...
        42 1d ce 1d e4 c8 92 41 47 75 f1 ff c0 32 da d0  B......AGu...2..
        f3 38 18 f4 57 1f e1 7d 7a 7e 9f 02 40 10 1a 00  .8..W..}z~..@...
        03 55 fa 30 10 fb 9b 3f f1 93 5c 21 80 16 fd 7d  .U.0...?..\!...}
        42 7f 79 02 53 0a 69 7f fd af                    B.y.S.i...      
> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 13.363878
        Num handles: 1
        Handle: 64
        Count: 2
> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 13.365618
        Num handles: 1
        Handle: 64
        Count: 2
> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 13.366557
        Num handles: 1
        Handle: 64
        Count: 2
> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 13.586296
        Num handles: 1
        Handle: 64
        Count: 1
< ACL Data TX: Handle 64 flags 0x02 dlen 27                    [hci0] 13.598058
        invalid packet size (218 != 27)
        b8 00 04 00 1b 0d 00 01 e1 a4 d7 b7 0b 0b 62 eb  ..............b.
        be ba eb be ba eb ae f5 7d f4 b2 02 40 10 1b 00  ........}...@...
        72 6a 3f 0c fb fb 4e fb 97 91 1e 89 c7 89 cb 42  rj?...N........B
        f4 ff fc 73 d0 fe 83 1f 9d c0 f7 02 40 10 1b 00  ...s........@...
        e0 ee 6b fa c9 48 43 11 00 00 fb 2f fb 35 23 93  ..k..HC..../.5#.
        ae ba eb ae fa eb ae 3f 00 10 5a 02 40 10 1b 00  .......?..Z.@...
        fa fe 3f fe 00 e2 b4 c6 83 ab 97 bd 75 ad 75 e3  ..?.........u.u.
        ea a0 05 f1 cc ac 0b 22 17 68 50 02 40 10 1b 00  .......".hP.@...
        d7 f0 f3 47 c8 23 cd 5b bd 06 6d 29 f0 c9 ef cd  ...G.#.[..m)....
        a3 fe 6e 3d cc c3 df 23 a6 16 ff 02 40 10 1b 00  ..n=...#....@...
        ef 02 de 88 8e 4d af 7f 33 6f 9c 6f 72 88 01 ab  .....M..3o.or...
        1e f0 cf 9a cf f4 39 59 69 bf d8 02 40 10 1a 00  ......9Yi...@...
        ff ec 3f fa eb aa c2 be 68 00 0d ea 06 a0 02 8b  ..?.....h.......
        3e bd 3a 7a be 85 27 5d 75 c9                    >.:z..']u.      
> HCI Event: Number of Completed Packets (0x13) plen 5         [hci0] 13.611290
        Num handles: 1
        Handle: 64
        Count: 2

I think that flag 02 is not correct as it appears in the first 20-byte packet.

Also, perhaps I'm wrong, but Bluez writes the whole 180 byte to kernel WITHOUT ANY SPLIT IN USER-SPACE.

With Bluez:

< ACL Data TX: Handle 64 flags 0x00 dlen 27                                             [hci0] 109.849466
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849489
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849496
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849503
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849516
< ACL Data TX: Handle 64 flags 0x01 dlen 27                                             [hci0] 109.849522
< ACL Data TX: Handle 64 flags 0x01 dlen 26                                             [hci0] 109.849528

is done by the chip (or perhaps the Linux kernel) but it's not done by user-space Bluez because Bluez writes to the file descriptor the whole 180 byte frame and there is no code to split by 20 or 27 in user-space Bluez.

"Some time ago I tried splitting in the btferret code, but it was no faster than letting the controller do it." That's my point: I think that my adapter doesn't split with btferret:

< ACL Data TX: Handle 64 flags 0x00 dlen 188                                                    [hci0] 20.475562

I can't explain why.

Actually, it looks like it's working just fine. The only problem is that the split packets are sent in a single write and the monitor is not smart enough to see that it contains multiple packets. The adapter does not care about this and sends the packets correctly. Below is a modified splitcmd() that does send each packet separatly (flag=1 setting), so they will be seen by the monitor, but the transmitted result will be exactly the same (the multiple writes will be a little bit slower). I'm having no problems with the flag=02 value which is how initial packets arrive, but I've modified splitcmd to set flag=00 to match bluez. The monitor reports what the user sends in a write, not what the adapter transmits, so bluez is splitting the packets itself, just like splitcmd. It appears that your code is choosing to wait 250ms between notifications (1st at 13.346095, 2nd at 13.598058). If I send two notifications with a 10ms delay, I get this, and there are no problems.

write_ctiic(....,160);
usleep(10e3);
write_ctic(....,160);

< ACL Data TX: Handle 65 flags 0x02 dlen 27                #58 [hci0] 20.049511
        invalid packet size (197 != 27)
        a3 00 04 00 52 08 00 11 22 33 44 00 00 00 00 00  ....R..."3D.....

< ACL Data TX: Handle 65 flags 0x02 dlen 27                #61 [hci0] 20.059691    ****** 10ms later******
        invalid packet size (197 != 27)
        a3 00 04 00 52 08 00 11 22 33 44 00 00 00 00 00  ....R..."3D.....

Modified splticmd (see flag setting).

int splitcmd(unsigned char *s)
  {
  int n,k,sn,tn,tn0,t20,len,numx,remx,kx,flag;
  unsigned char t[512];
  
  
  len = s[5] + (s[6] << 8) - 3;
  if(len <= 20)
    return(0);  
    
 
  flag = 1;   // 0 = send all packets as one write
              // 1 = send each packet as a separate write
  
  for(n = 0 ; n < 32 ; ++n)
    t[n] = s[n];
    
  t20 = t[2];
  tn = len+12;  
  t[3] = 0x1B;
  t[4] = 0;
  numx = (len-20)/27;
  remx = (len-20)%27;
  if(remx > 0)
    ++numx;
  sn = 32;
  tn = 32;
  if(flag != 0)  
    write(gpar.hci,t,32);
  for(n = 0 ; n < numx ; ++n)
    {
    ++gpar.cmdcount;
    tn0 = tn;
    t[tn] = 0x02;
    t[tn+1] = t[1];
    t[tn+2] = t20 | 0x10;
    if(n == numx-1 && remx > 0)
      {
      t[tn+3] = remx;
      kx = remx;
      }
    else
      {
      t[tn+3] = 0x1B;
      kx = 27;
      }
    t[tn+4] = 0;
    tn += 5;

    for(k = 0 ; k < kx ; ++k)
      {
      t[tn] = s[sn];
      ++tn;
      ++sn;
      }
    if(flag != 0)   
      write(gpar.hci,t+32*(n+1),kx+5);
    }
  if(flag == 0)
    write(gpar.hci,t,tn);
  ++gpar.cmdcount;
  return(1);
  }


I have now a speed similar for btferret as bluez. Thank you very much for all the suggestions. But:

  • I had to play aggressively with usleep to find the best value depending of what I'm sending.

  • I still would like to say again that:

    1. Bluez writes to the kernel the WHOLE 181 bytes in one write call and there is no split in user-space Bluez.
    1. Bluez gets a callback when the notification has been sent. I have read the code and there is no while () usleep or while (readhci) ; equivalent. There is a callback coming from the epoll that Bluez registers with the kernel.

I'm reopening this issue!!! You are going to hate me. Sorry about that.

So, it's working with the splitting code but the architecture to know if it's safe to send the next frame is not optimal. You offered two solutions:

  • comparing cmd_stack_ptr() to a magic value of 5 or 6 or 7. It's flaky - I think that you agreed with that.
  • tweaking and fine-tuning a magic usleep. I still see "problems" but the value depends both of the sending and receiving adapters.

This is not optimal. As I said in my previous comment, bluez gets a callback. I have read the user-space code of Bluez and I think that I better understand how they do it.

So I implemented the following pseudo-code:

void wait_write_done() {
	//In a separate thread
	struct pollfd pollfd[1];
	pollfd[0].fd = gpar.hci;
	pollfd[0].events = POLLOUT;
	poll(pollfd, 1, -1);
	if (pollfd[0].revents & POLLOUT) {
		//Write is done
	}
}

while (1) {
	write_ctic(...);
	wait_write_done();
}

This is how Bluez is doing it: https://github.com/bluez/bluez/blob/master/src/shared/io-mainloop.c#L124

My understanding is that the kernel sends back POLLOUT when write is done.

Do you confirm this analysis and would you investigate leveraging a POLLOUT poll of gpar_hci? You could even find useful POLLIN to know that something is ready to be read.

The btferret socket is non-blocking, so write returns immediately if it fails or succeeds. I suspect that the bluez socket is blocking, so it has to call poll to find out what is happening. With a non-blocking socket, there is no point calling poll because write will not be running. The splitcmd code only has one try at write, so it misses the failures. Here is a more careful revised version with an extra splitwrite function that keeps calling write until it succeeds or times out. This should do the same thing as a completion callback.

int splitcmd(unsigned char *s)
  {
  int n,k,sn,tn,tn0,t20,len,numx,remx,kx,flag;
  unsigned char t[512];
  
  
  len = s[5] + (s[6] << 8) - 3;
  if(len <= 20)
    return(0);  
    
 
  flag = 1;   // 0 = send all packets as one write
              // 1 = send each packet as a separate write
  
  for(n = 0 ; n < 32 ; ++n)
    t[n] = s[n];
    
  t20 = t[2];
  tn = len+12;  
  t[3] = 0x1B;
  t[4] = 0;
  numx = (len-20)/27;
  remx = (len-20)%27;
  if(remx > 0)
    ++numx;
  sn = 32;
  tn = 32;
  if(flag != 0)  
    splitwrite(t,32);
  for(n = 0 ; n < numx ; ++n)
    {
    ++gpar.cmdcount;
    tn0 = tn;
    t[tn] = 0x02;
    t[tn+1] = t[1];
    t[tn+2] = t20 | 0x10;
    if(n == numx-1 && remx > 0)
      {
      t[tn+3] = remx;
      kx = remx;
      }
    else
      {
      t[tn+3] = 0x1B;
      kx = 27;
      }
    t[tn+4] = 0;
    tn += 5;

    for(k = 0 ; k < kx ; ++k)
      {
      t[tn] = s[sn];
      ++tn;
      ++sn;
      }
    if(flag != 0)   
      splitwrite(t+32*(n+1),kx+5);
    }
  if(flag == 0)
    splitwrite(t,tn);
  ++gpar.cmdcount;
  return(1);
  }

int splitwrite(unsigned char *cmd,int len)
  {
  int ntogo,nwrit;
  unsigned int timstart;
  unsigned char *cmdx;
   
  ntogo = len;
  cmdx = cmd;  
  timstart = timems(TIM_LOCK);  
  do
    {
    nwrit = write(gpar.hci,cmdx,ntogo);
 
    if(nwrit > 0)
      {
      ntogo -= nwrit;
      cmdx += nwrit;
      }   
    if(timems(TIM_RUN) - timstart > 2000)   // 2 sec timeout
      {
      NPRINT "Send CMD timeout\n");
      timems(TIM_FREE);
      return(0);
      }
    }
  while(ntogo != 0);
  timems(TIM_FREE);
  return(1);
  }

From an architecture point of view, this new code is really not better because you are doing "do { write } while (not_success)". It's 100% CPU. Fortunately (or unfortunately - depends how you see it), you didn't add a sleep.

I have checked Bluez and you are correct that their socket is blocking because it's returned by an "accept" and SOCK_NONBLOCK is not used.

#13 is the same problem. You are using 100% CPU or you need to insert some kind of sleep.

I think that we are eventually touching the core question of your architecture choice: why did you chose to do a non-blocking socket? To my mind, you are missing valuable information from the kernel that "things" are either busy or need to be handled. It's a way to avoid doing things like "while(1) { check_read(); optional_sleep(); }" or "do { write(); } while (not_success)".

I've uploaded a new version 10 that includes functions to set the LE connection interval. There is also a fix for a bug that I think is extremely rare, but may be showing up here. The splitcmd() code is also included - but not called.
The splitwrite code to loop for write to succeed is absolutely necessary. One call to write is not guaranteed to work with a non-blocking socket. It is only doing what a blocking socket does out of sight - it waits for the serial port to accept the data. If the port is busy, the wait will only be a few milliseconds, so the use of CPU for that time can probably be tolerated, and adding a sleep will just delay the process. If a write fails, the reason is available via errno. Using a non-blocking socket lays everything open to programmer control, which is the point of btferret.

Thank you for the connection interval APIs. That will be useful.

Android reception for my high traffic starts to be stable and working OK. But I have some trouble with iOS. More precisely, there is some "caching" somewhere (most likely on reception) and I get the traffic with some delays increasing overtime. It's like btferret sends the notifications, iPhone adapter caches the events. iOS app gets the notification with very little delay when traffic starts, and then more and more delays after some time. Any idea on this situation? iOS is much more conservative for connection interval than Android (you can't do 6 in iOS).

I know nothing about iOS, and have no devices to test. A check that may reveal something is to monitor this command sent from the server, and check the iOS reply.

set_le_interval_server(clientnode,6,6)

Server sends:

< ACL Data TX: Handle 64 flags 0x00 dlen 16         #36 [hci0] 24.830243
      LE L2CAP: Connection Parameter Update Request (0x12) ident 3 len 8
        Min interval: 6
        Max interval: 6
        Slave latency: 0
        Timeout multiplier: 500

Reply from client if it agrees:

> ACL Data RX: Handle 64 flags 0x02 dlen 10         #43 [hci0] 24.999741
      LE L2CAP: Connection Parameter Update Response (0x13) ident 3 len 2
        Result: Connection Parameters accepted (0x0000)

I have opened a separate issue to optimize iOS in order to avoid spoiling this thread.

Back to the core of this issue (as well as #13), and following your comment: "Using a non-blocking socket lays everything open to programmer control, which is the point of btferret", I have checked and len = read(gpar.hci,buf,sizeof(buf)); is called 70 000 times in 100ms! ;-) It's 100% CPU and a killer for an embedded device.

The 12 000 lines of btlib.c are great but it's too bad that performance is damaged by this intensive while loop. Fortunately, as there is only one main read in the whole code. It wouldn't be too difficult to improve the code and add something like:

timeend = nowms() + timeout;
while (nowms() < timeend) {
	struct pollfd pollfds[1];
	pollfds[0].fd = gpar.hci;
	pollfds[0].events = POLLIN;
	if (poll(pollfds, 1, timeend - nowms()) == 0)//timeout
		continue;
	if (pollfds[0].revents & POLLIN) {
		read(gpar.hci, ...);
	}
}

and remove SOCK_NONBLOCK on line 7924. I don't think that your architecture will be impacted anywhere else.

Here is a very simple patch that fixes issue #13 and avoid 100% CPU usage:

--- a/btlib.c
+++ b/btlib.c
@@ -7,6 +7,7 @@
 #include <termios.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/poll.h>
 #include <fcntl.h>
 #include "btlib.h"     
 
@@ -5266,6 +5267,12 @@ int splitwrite(unsigned char *cmd,int len)
   timstart = timems(TIM_LOCK);  
   do
     {
+struct pollfd pollfds[1];
+pollfds[0].fd = gpar.hci;
+pollfds[0].events = POLLOUT;
+if (poll(pollfds, 1, 2000) == 0)
+	nwrit = 0;
+else if (pollfds[0].revents & POLLOUT)
     nwrit = write(gpar.hci,cmdx,ntogo);
  
     if(nwrit > 0)
@@ -5516,6 +5523,12 @@ int readhci(int ndevice,long long int mustflag,long long int lookflag,int timout
           timendms = toshort;
           do
             {
+struct pollfd pollfds[1];
+pollfds[0].fd = gpar.hci;
+pollfds[0].events = POLLIN;
+if (poll(pollfds, 1, timendms) == 0)
+	len = 0;
+else if (pollfds[0].revents & POLLIN)
             len = read(gpar.hci,buf,sizeof(buf));
           
             if(len > 0)  // restart timeout - still toshort
@@ -5544,6 +5557,12 @@ int readhci(int ndevice,long long int mustflag,long long int lookflag,int timout
         {         
         do       // read block of data - may be less than or more than one line
           {
+struct pollfd pollfds[1];
+pollfds[0].fd = gpar.hci;
+pollfds[0].events = POLLIN;
+if (poll(pollfds, 1, timendms) == 0)
+	len = 0;
+else if (pollfds[0].revents & POLLIN)
           len = read(gpar.hci,&buf[blen],sizeof(buf)-blen);                  
            
            // len = number of bytes read  0=EOF -1=error     
@@ -7958,7 +7977,7 @@ int hcisock()
 
          // AF_BLUETOOTH=31
   VPRINT "Open BTPROTO socket\n");
-  dd = socket(31, SOCK_RAW | SOCK_CLOEXEC | SOCK_NONBLOCK, BTPROTO_HCI);
+  dd = socket(31, SOCK_RAW | SOCK_CLOEXEC/* | SOCK_NONBLOCK*/, BTPROTO_HCI);
 
   if(dd < 0)
     {

Without any usleep during agressive writes, the adapter stack at least doesn't crash but it's still not yet completely satisfactory as it doesn't behave like with Bluez.