pvbrowser/pvb

Siemens TCP communication unnecessarily slow

vseguip opened this issue · 14 comments

Hello I'm using the rllib as a standalone project. Right now the rlsiemenstcp class is unnecessarily slow on my system (Ubuntu 12.04 64 bits). After some research and benchmarking comparison with libnodave, I've found that the main reason is that rlsiemenstcp does two writes to the socket for each request. There are several ways to avoid the delay use one write (copy header and buffer to one total buffer and send that), use TCP_NODELAY, use non NONBLOCKING or clever use of TCP_CORK. I attach an easy solution using the first option:

int rlSiemensTCP::write_iso(unsigned char *buf, int len)
{
  int i,ret, total_len;
  char stack_buf[512];
  char *total_buf=0;

  if(rlSocket::isConnected() == 0) doConnect();
  if(rlSocket::isConnected() == 0) return -1;
  ih.version  = 3;
  ih.reserved = 0;
  ih.length_high = (len+4) / 256;
  ih.length_low  = (len+4) & 0x0ff;
  if(len <=0) 
  {
    rlDebugPrintf("write_iso:failure to write iso packet negative length -> disconnecting\n");
    rlSocket::disconnect(); 
  }
  total_len = len + sizeof(ih);
  if(total_len > sizeof(stack_buf)) 
  {
      total_buf = new char[len + sizeof(ih)];
  }
  else 
  {
      total_buf = stack_buf;
  }
  memcpy(total_buf, &ih, sizeof(ih));
  memcpy(total_buf+sizeof(ih), buf, len);
  ret = rlSocket::write(total_buf, total_len);
  if(total_len > sizeof(stack_buf))
      delete [] total_buf;
  if(ret < 0)
  { 
    rlDebugPrintf("write_iso:failure to write iso packet -> disconnecting\n");
    rlSocket::disconnect(); 
    return ret; 
  }  

  if(rlDebugPrintfState != 0)
  {
      ::printf("write_iso() len=%d\n", len);
    for(i=0; i<len; i++) ::printf("%02x,",buf[i]);
    ::printf("\n");
  }
  return len;
}

I think TCP_NODELAY would be a better solution since it avoids copying memory, but I haven't implemented it since it will either break expectations from other users of rlSocket, or it will break rlSocket API. I also don't have all the platforms where rllib is being used so I can't test TCP_NODELAY.

By my measurements, grouping those two writes gives me ~6x speedup (from 10s to perform 100 reads on a S71200 to 1.4 seconds) .

Regards,
V. Seguí

Hello Vincent,

many thanks for your hint about optimization.
I have implemented your hint as 1 write with a total_buf.
My implementation is somewaht different than yours because i did not
want to use the "new" operator.

int rlSiemensTCP::write_iso(unsigned char *buf, int len)
{
int i,ret;

if(rlSocket::isConnected() == 0) doConnect();
if(rlSocket::isConnected() == 0) return -1;

// speedup siemens communication as suggested by Vincent Segui Pascual
// do only 1 write
unsigned char total_buf[sizeof(IH) + len];
IH ih = (IH *) &total_buf[0];
ih->version = 3;
ih->reserved = 0;
ih->length_high = (len+4) / 256;
ih->length_low = (len+4) & 0x0ff;
for(int i=0; i<len; i++) total_buf[sizeof(IH) + i] = buf[i];
ret = rlSocket::write(total_buf,sizeof(IH) + len);
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write buf -> disconnecting\n");
rlSocket::disconnect();
return ret;
}
/

ih.version = 3;
ih.reserved = 0;
ih.length_high = (len+4) / 256;
ih.length_low = (len+4) & 0x0ff;
ret = rlSocket::write(&ih,sizeof(ih));
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write iso header ->
disconnecting\n");
rlSocket::disconnect();
return ret;
}
ret = rlSocket::write(buf,len);
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write buf -> disconnecting\n");
rlSocket::disconnect();
return ret;
}
*/
if(rlDebugPrintfState != 0)
{
::printf("write_iso() len=%d\n", len);
for(i=0; i<len; i++) ::printf("%02x,",buf[i]);
::printf("\n");
}
return len;
}

It compiles aready but i did not test up to now.
This will be done soon.

Best regards:
Rainer Lehrig

Am 30.10.2012 11:14, schrieb Vicent Seguí Pascual:

Hello I'm using the rllib as a standalone project. Right now the
rlsiemenstcp class is unnecessarily slow on my system (Ubuntu 12.04 64
bits). After some research and benchmarking comparison with libnodave,
I've found that the main reason is that rlsiemenstcp does two writes
to the socket for each request. There are several ways to avoid the
delay use one write (copy header and buffer to one total buffer and
send that), use TCP_NODELAY, use non NONBLOCKING or clever use of
TCP_CORK. I attach an easy solution using the first option:

int rlSiemensTCP::write_iso(unsigned char *buf, int len)
{
int i,ret, total_len;
char stack_buf[512];
char *total_buf=0;

if(rlSocket::isConnected() == 0) doConnect();
if(rlSocket::isConnected() == 0) return -1;
ih.version = 3;
ih.reserved = 0;
ih.length_high = (len+4) / 256;
ih.length_low = (len+4) & 0x0ff;
if(len <=0)
{
rlDebugPrintf("write_iso:failure to write iso packet negative length -> disconnecting\n");
rlSocket::disconnect();
}
total_len = len + sizeof(ih);
if(total_len > sizeof(stack_buf))
{
total_buf = new char[len + sizeof(ih)];
}
else
{
total_buf = stack_buf;
}
memcpy(total_buf, &ih, sizeof(ih));
memcpy(total_buf+sizeof(ih), buf, len);
ret = rlSocket::write(total_buf, total_len);
if(total_len > sizeof(stack_buf))
delete [] total_buf;
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write iso packet -> disconnecting\n");
rlSocket::disconnect();
return ret;
}

if(rlDebugPrintfState != 0)
{
::printf("write_iso() len=%d\n", len);
for(i=0; i<len; i++) ::printf("%02x,",buf[i]);
::printf("\n");
}
return len;
}

I think TCP_NODELAY would be a better solution since it avoids copying
memory, but I haven't implemented it since it will either break
expectations from other users of rlSocket, or it will break rlSocket
API. I also don't have all the platforms where rllib is being used so
I can't test TCP_NODELAY.

By my measurements, grouping those two writes gives me ~6x speedup
(from 10s to perform 100 reads on a S71200 to 1.4 seconds) .

Regards,
V. Seguí


Reply to this email directly or view it on GitHub
#1.

Hi Rainer,
Thanks for your quick reply. As far as I know, dynamic size arrays are not a feature of C++ but rather of C99 and supported only on some compilers. I'm not really sure your code will compile on all platforms (this is why I used a fixed stack buffer which should cover most or all cases and only resort to dynamic alloc in case it's too small.)

Hello Vicent,

i had not been aware of this problem and with g++ it works as i expected.
But it seems to be an issue:
https://groups.google.com/forum/?fromgroups=#!topic/comp.std.c++/K_4lgA1JYeg
http://stackoverflow.com/questions/1887097/variable-length-arrays-in-c

Thus i changed the code as follows.

int rlSiemensTCP::write_iso(unsigned char *buf, int len)
{
int i,ret;

if(rlSocket::isConnected() == 0) doConnect();
if(rlSocket::isConnected() == 0) return -1;

// speedup siemens communication as suggested by Vincent Segui Pascual
// do only 1 write
// unsigned char total_buf[sizeof(IH) + len];
// in theory dynamic size arrays are not a feature of C++
unsigned char total_buf = new unsigned char [sizeof(IH) + len];
IH *ih = (IH *) &total_buf[0];
ih->version = 3;
ih->reserved = 0;
ih->length_high = (len+4) / 256;
ih->length_low = (len+4) & 0x0ff;
for(int i=0; i<len; i++) total_buf[sizeof(IH) + i] = buf[i];
ret = rlSocket::write(total_buf,sizeof(IH) + len);
delete [] total_buf;
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write buf -> disconnecting\n");
rlSocket::disconnect();
return ret;
}
/

ih.version = 3;
ih.reserved = 0;
ih.length_high = (len+4) / 256;
ih.length_low = (len+4) & 0x0ff;
ret = rlSocket::write(&ih,sizeof(ih));
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write iso header ->
disconnecting\n");
rlSocket::disconnect();
return ret;
}
ret = rlSocket::write(buf,len);
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write buf -> disconnecting\n");
rlSocket::disconnect();
return ret;
}
*/
if(rlDebugPrintfState != 0)
{
::printf("write_iso() len=%d\n", len);
for(i=0; i<len; i++) ::printf("%02x,",buf[i]);
::printf("\n");
}
return len;
}

Best regards:
Rainer Lehrig

PS: today i tested the code with a S7 200

Am 30.10.2012 15:19, schrieb Vicent Seguí Pascual:

Hi Rainer,
Thanks for your quick reply. As far as I know, dynamic size arrays are
not a feature of C++ but rather of C99 and supported only on some
compilers. I'm not really sure your code will compile on all platforms
(this is why I used a fixed stack buffer which should cover must or
all cases and only resort to dynamic alloc in case it's too small.)


Reply to this email directly or view it on GitHub
#1 (comment).

Hello Rainer,
I find your solution easier although I think my own solution is better for the common case where the packet is <512 bytes since it will not allocate dynamically. OTOH did you manage to replicate the same speedups?

Regards,
V. Seguí

Hello Vicent,

now i have reviewed and tested a little bit more.

The maximum MTU size in ISO OSI communication is of limited size.
The pdu declared in the class has 2048 byte size which is more than the
maximum ISO OSI MTU size.
Thus we can simply use fixed sized buffers.
unsigned char total_buf[sizeof(ih)+sizeof(pdu)];

Now your memcpy is used which may be faster than a copy in a for loop
as i did.

I tested the communication with a S7 200 and i could replicate the same
speedups.
The difference can't be explained with network latency.
I think it can be explained with Nagle's algorithm.
A packed will be really send, when the last packed has been acknowledged.
I suppose the Siemens PLC is slow in this acknowledge.

I also made the change in the fetch/write case for S5.

Here is the current code:

int rlSiemensTCP::write_iso(unsigned char *buf, int len)
{
int i,ret;

if(len > (int) sizeof(pdu)) return -1;
if(rlSocket::isConnected() == 0) doConnect();
if(rlSocket::isConnected() == 0) return -1;

// speedup siemens communication as suggested by Vincent Segui Pascual
// do only 1 write
unsigned char total_buf[sizeof(ih)+sizeof(pdu)];
ih.version = 3;
ih.reserved = 0;
ih.length_high = (len+4) / 256;
ih.length_low = (len+4) & 0x0ff;
memcpy(total_buf, &ih, sizeof(ih));
memcpy(total_buf + sizeof(ih), buf, sizeof(ih)+len);
ret = rlSocket::write(total_buf, sizeof(ih)+len);
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write buf -> disconnecting\n");
rlSocket::disconnect();
return ret;
}
/*
ih.version = 3;
ih.reserved = 0;
ih.length_high = (len+4) / 256;
ih.length_low = (len+4) & 0x0ff;
ret = rlSocket::write(&ih,sizeof(ih));
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write iso header ->
disconnecting\n");
rlSocket::disconnect();
return ret;
}
ret = rlSocket::write(buf,len);
if(ret < 0)
{
rlDebugPrintf("write_iso:failure to write buf -> disconnecting\n");
rlSocket::disconnect();
return ret;
}
*/
if(rlDebugPrintfState != 0)
{
::printf("write_iso() len=%d\n", len);
for(i=0; i<len; i++) ::printf("%02x,",buf[i]);
::printf("\n");
}
return len;
}

Best regards:
Rainer Lehrig

Am 30.10.2012 21:25, schrieb Vicent Seguí Pascual:

Hello Rainer,
I find your solution easier although I think my own solution is better
for the common cas where the packet is <512 bytes since it will not
allocate dynamically. OTOH did you manage to replicate the same speedups?

Regards,
V. Seguí


Reply to this email directly or view it on GitHub
#1 (comment).

Hello, glad to see you can reproduce my speedups. Yes, I agree that the slowdown is due to Nagle's algorithim which is why setting TCP_NODELAY (deactivating it) is a better solution and actually gives the same speedups even when doing 2 writes. A back compatible fix would require exposing that functionality from the rlSocket class, is that possible? I'm quite sure other clients would benefit from setting TCP_NODELAY since it favours latency over throughput which is a good thing in this scenario.

Hello, socket s is a public member of rlSocket.

Thus you could use setsockopt to set TCP_NODELAY on the socket after
connect.
http://www.unixguide.net/network/socketfaq/2.16.shtml

Am 31.10.2012 08:47, schrieb Vicent Seguí Pascual:

Hello, glad to see you can reproduce my speedups. Yes, I agree that
the slowdown is due to Nagle's algorithim which is why setting
TCP_NODELAY (deactivating it) is a better solution and actually gives
the same speedups even when doing 2 writes. A back compatible fix
would require exposing that functionality from the rlSocket class, is
that possible? I'm quite sure other clients would benefit from setting
TCP_NODELAY since it favours latency over throughput which is a good
thing in this scenario.


Reply to this email directly or view it on GitHub
#1 (comment).

I was kind of hoping that could be done in a cross platform compatible way encapsulated in the rlSocket ;). Kind of like this pseudo code:

int rlSocket::setNagleAlgorithim(bool activate){
#ifdef WIN32
    foo
#elifdef UNIX
   bar
#elifdef VMS
   baz
#endif
}

EDIT: changed name

We could add a method to rlSocket.
The necessary headers are already included and the setsockopt function
is very similar on the different OS.
Only Windows is a little bit different.

Example:
int option;

option = 1;
#ifdef RLWIN32
setsockopt(os,SOL_SOCKET,SO_REUSEADDR,(const char *)
&option,sizeof(option));
#else
setsockopt(os,SOL_SOCKET,SO_REUSEADDR,&option,sizeof(option));
#endif

The definitions like
IPPROTO_TCP
TCP_NODELAY
should be available on all OS.

This function should be portable.
int rlSetSockOpt(int s, int level, int optname)
{
int option = 1;
#ifdef RLWIN32
int ret = setsockopt(s, int level, int optname, (const char *)
&option, sizeof(option));
#else
int ret = setsockopt(s, int level ,int optname, &option, sizeof(option));
#endif
return ret;
}

Could you please do the testing.
If everything is OK we could add the function to rlsocket.cpp or as a
method to the class rlSocket.

Am 31.10.2012 08:59, schrieb Vicent Seguí Pascual:

I was kind of hoping that could be done in a cross platform compatible
way encapsulated in the rlSocket ;). Kind of like this pseudo code:

int rlSocket::setTCPNodelay(bool activate){
#ifdef WIN32
foo
#elifdef UNIX
bar
#elifdef VMS
baz
#endif
}


Reply to this email directly or view it on GitHub
#1 (comment).

Hello,
It works albeit with some modifications. Classes that want to use the interface have to include the proper headers since they are not included in rlSocket.h but in rlSocket.cpp

Hello, i added 2 functions to rlsocket.h / rlsocket.cpp (see github)
The functions are portable between Linux/Unix/Windows/OpenVMS
(currently tested on Linux/Windows/OpenVMS)
int rlGetsockopt(int sockfd, int level, int optname, void *optval,
int *optlen);
int rlSetsockopt(int sockfd, int level, int optname, const void *optval,
int optlen);

The necessary header files for the definition of the socket level and
optname
are now included in rlsocket.h

Am 31.10.2012 15:12, schrieb Vicent Seguí Pascual:

Hello,
It works albeit with some modifications. Classes that want to use the
interface have to include the proper headers since they are not
included in rlSocket.h but in rlSocket.cpp


Reply to this email directly or view it on GitHub
#1 (comment).

Hi,
Wouldn't it be better to simplify the declaration to int rlGetsockopt(int level, int optname) since s is already a member of the class and optval/optlen are not used that much? We can also have int rlGetsockopt(int sockfd, int level, int optname, void *optval, int *optlen); as static members of the class for completeness.

I have added the conveniance methods (see github).

PS: Which other classes besides rlSiemensTCP and rlSocket do you use
from rllib?

Am 02.11.2012 09:50, schrieb Vicent Seguí Pascual:

Hi,
Wouldn't it be better to smplyfy the declaration to int
rlGetsockopt(int level, int optname) since s is already a member of
the class and optval/optlen are not used that much? We can also have
int rlGetsockopt(int sockfd, int level, int optname, void *optval, int
*optlen); as static members od the class for completeness.


Reply to this email directly or view it on GitHub
#1 (comment).

Hello,
Sorry for the late reply. I'm planning on using the rlClient* version to access the SHM segments.