adaptivecomputing/torque

Error with X11 Forwarding

Opened this issue · 1 comments

Users see 'could not bind local socket: Invalid argument' in their error stream when using qsub -X. The error originates from

fprintf(stderr, "could not bind local socket: %s", strerror(errno));

It is because the get_local_address() function returns a sa_family=AF_INET struct but the loop at

for (ai = aitop; ai; ai = ai->ai_next)
{
/* Create a socket. */
sock = socket(ai->ai_family, SOCK_STREAM, 0);
if (sock < 0)
{
fprintf(stderr, "socket: %.100s", strerror(errno));
continue;
}
#ifdef BIND_OUTBOUND_SOCKETS
/* Bind to the IP address associated with the hostname, in case there are
* muliple possible source IPs for this destination.*/
// don't bind localhost addr
if (!islocalhostaddr(&local))
{
if (bind(sock, (struct sockaddr *)&local, sizeof(sockaddr_in)))
{
fprintf(stderr, "could not bind local socket: %s", strerror(errno));
close(sock);
continue;
}
}
#endif
will attempt to bind all types, causing bind to be called on a AF_INET6 socket with a AF_INET argument and producing the error message.

I've submitted #458 as a potential fix, but should probably be revisited so that it will work with IPv6. Note that the IPv6 limitation is one of get_local_address() and not the patch.

An example from strace of it trying to bind a mismatched AF_ family and then succeeding on the second:

[pid 27167] socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 8
[pid 27167] bind(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("redacted")}, 16) = -1 EINVAL (Invalid argument)
[pid 27167] write(2, "could not bind local socket: Inv"..., 45) = 45
[pid 27167] close(8) = 0
[pid 27167] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 8
[pid 27167] bind(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("redacted")}, 16) = 0