nixcloud/ip2unix

socket is not created

riedel opened this issue · 6 comments

How to debug a socket not being created?

I do the following but no socket is created (ie. the file simply does not exist) BTW:the server works without ip2unix.

> ip2unix -p -vvv -r path=xxx.socket rsession --standalone=1 --program-mode=server --log-stderr=1 --www-address 127.0.0.1 --www-port 8080
Rule #1:
  Direction: both
  IP Type: TCP and UDP
  Address: <any>
  Port: <any>
  Socket path: /smartdata/iu5681/xxx.socket
ip2unix INFO: Registered socket with fd 6, domain 2, type 1 and protocol 6.
ip2unix INFO: Created new Unix socket with fd 7.
ip2unix INFO: Replaced socket fd 6 by socket with fd 7.

it looks the a case when a socket is created:

ip2unix -p -vvv -r path=xxx.socket nc -l 127.0.0.1 8080
Rule #1:
  Direction: both
  IP Type: TCP and UDP
  Address: <any>
  Port: <any>
  Socket path: /smartdata/iu5681/xxx.socket
ip2unix INFO: Registered socket with fd 3, domain 2, type 1 and protocol 6.
ip2unix INFO: Created new Unix socket with fd 4.
ip2unix INFO: Replaced socket fd 3 by socket with fd 4.

Thankful for any clues.

strace gives me on the faulty case:

setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(6, F_GETFD)                       = 0
fcntl(6, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(6, F_GETSIG)                      = 0
fcntl(6, F_GETOWN_EX, {type=F_OWNER_PID, pid=0}) = 0
bind(6, {sa_family=AF_UNIX, sun_path="/smartdata/iu5681/xxx.socket"}, 110) = 0
listen(6, 128)                          = 0
ioctl(6, FIONBIO, [1])                  = 0
accept4(6, NULL, NULL, 0)               = -1 EAGAIN (Resource temporarily unavailable)

on the working case:

setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
fcntl(3, F_GETFD)                       = 0
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_GETSIG)                      = 0
fcntl(3, F_GETOWN_EX, {type=F_OWNER_PID, pid=0}) = 0
bind(3, {sa_family=AF_UNIX, sun_path="/smartdata/iu5681/xxx.socket"}, 110) = 0
listen(3, 10)                           = 0
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0

seems to be a bug. I just confirmed that socket_wrapper is working with the above commandline

LD_PRELOAD=$PWD/lib/libsocket_wrapper.so SOCKET_WRAPPER_DIR=$PWD/sockets SOCKET_WRAPPER_DEFAULT_IFACE=10 rsession --standalone=1 --program-mode=server --log-stderr=1 --www-address 127.0.0.1 --www-port 8080

socket_wrapper seems to strip SOCK_CLOEXEC and SOCK_NONBLOCK from the socket option. Could that be the cause?

@riedel: Thanks for the report. Could you also reproduce this with nc or does it only happen with rsession only?

the "working case" is nc . I have had a really hard time reproducing the behaviour. It seems to be happening only for rsession in combination with ip2unix (rsession is really interesting because it provides no access control mechanisms, see jupyterhub/jupyter-rsession-proxy#14 (comment)) . I am happy to help isolating the issue. I looked at cwrap, which works, but now understand it is working completely differently, directly swapping the socket (so you cannot distiguish the target IP).

Okay, this has nothing to do with the EAGAIN return from accept, here is the difference between rsession without ip2unix:

socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=1446161200, u64=93825006742320}}) = 0
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
bind(6, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
listen(6, 128)              = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [28->16]) = 0
ioctl(6, FIONBIO, [1])      = 0
accept(6, NULL, NULL)       = -1 EAGAIN (Resource temporarily unavailable)

... and here with ip2unix:

socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=1446163152, u64=93825006744272}}) = 0
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
socket(AF_UNIX, SOCK_STREAM, 0) = 7
fcntl(6, F_GETFD)           = 0
fcntl(7, F_SETFD, 0)        = 0
fcntl(6, F_GETFL)           = 0x2 (flags O_RDWR)
fcntl(7, F_SETFL, O_RDWR)   = 0
fcntl(6, F_GETSIG)          = 0
fcntl(7, F_SETSIG, 0)       = 0
fcntl(6, F_GETOWN_EX, {type=F_OWNER_TID, pid=0}) = 0
fcntl(7, F_SETOWN_EX, {type=F_OWNER_TID, pid=0}) = 0
setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
dup2(7, 6)                  = 6
close(7)                    = 0
bind(6, {sa_family=AF_UNIX, sun_path="/build/test.socket"}, 110) = 0
listen(6, 128)              = 0
ioctl(6, FIONBIO, [1])      = 0
accept4(6, NULL, NULL, 0)   = -1 EAGAIN (Resource temporarily unavailable)

The interesting point here is that epoll_ctl is executed between the old and the newly replaced socket, so a way to fix this is to do something similar to how we replay setsockopt and friends but for epoll_ctl.

This is also the reason why socket_wrapper doesn't have this problem, since it doesn't need to replace the socket.