gen_tcp socket based implementation do not work with FDs passed from systemd
hauleth opened this issue · 25 comments
Describe the bug
gen_tcp
socket do not receive any messages when passing file descriptor as an inet6
socket.
To Reproduce
Example application can be found in systemd
repo.
To be able to test it out you need to be running systemd-enabled Linux distribution. You can use Multipass to get one (Ubuntu) if you do not have one.
- Run application with socket activation
$ systemd-socket-activate -l 8888 --fdname=tcp -E ERL_FLAGS='-kernel inet_backend socket' rebar3 shell
- In separate shell connect to the echo service
nc localhost 8888
- No message is delivered to the
echo_tcp
process.
For comparing with default implementation works as expected. Steps:
- Run application with socket activation
$ systemd-socket-activate -l 8888 --fdname=tcp rebar3 shell
- In separate shell connect to the echo service
nc localhost 8888
- Messages are delivered to listening process and are responded to.
Expected behavior
When running process with -kernel inet_backend socket
messages should be delivered to listening process and such process should respond to the client.
Affected versions
I have tested on OTP 23.3, but I assume that all 23.x line is affected.
Additional context
This come out when I was trying to test whether socket
based implementation of gen_tcp
will allow fixing problem with dropping the connections when using systemd socket activation.
Ref hauleth/erlang-systemd#24 (comment)
The goal there is to allow full VM restarts without dropping socket connections.
Could you test this with 24.0-rc2?
There has been lot of bug fixes and changes in the master (24.0) branch.
I have ported the code from using gen_tcp
:
-module(echo_tcp).
-include_lib("kernel/include/logger.hrl").
-export([start_link/1,
listen/1]).
start_link(Opt) ->
Pid = spawn_link(?MODULE, listen, [Opt]),
{ok, Pid}.
listen(Opt) ->
Opts = [Opt,
binary,
{nodelay, true},
{packet, 0},
{active, false},
{reuseaddr, true}],
{ok, LSock} = gen_tcp:listen(0, Opts),
accept(LSock).
accept(LSock) ->
{ok, Sock} = gen_tcp:accept(LSock),
spawn(fun() -> loop(Sock) end),
accept(LSock).
loop(Sock) ->
case gen_tcp:recv(Sock, 0) of
{ok, Data} ->
?LOG_NOTICE(#{data => Data, protocol => tcp}),
gen_tcp:send(Sock, Data),
loop(Sock);
{error, closed} ->
ok
end.
to using raw socket
:
-module(echo_tcp).
-include_lib("kernel/include/logger.hrl").
-export([start_link/1,
listen/1]).
start_link(Opt) ->
Pid = spawn_link(?MODULE, listen, [Opt]),
{ok, Pid}.
listen({fd, Fd}) ->
{ok, LSocket} = socket:open(Fd),
ok = socket:listen(LSocket),
accept(LSocket);
listen({port, Port}) ->
{ok, LSocket} = socket:open(inet6, stream, tcp),
ok = socket:bind(LSocket, #{family => inet6,
port => Port,
addr => any}),
ok = socket:listen(LSocket),
accept(LSocket).
accept(LSock) ->
{ok, Sock} = socket:accept(LSock),
spawn(fun() -> loop(Sock) end),
accept(LSock).
loop(Sock) ->
case socket:recv(Sock) of
{ok, Data} ->
?LOG_NOTICE(#{data => Data, protocol => tcp}),
socket:send(Sock, Data),
loop(Sock);
{error, closed} ->
ok
end.
And with raw socket
everything works as expected. It seems like gen_tcp
tries to do some additional work (probably call bind
on already bound FD), but I still haven't traced where and why it is happening.
Ok, I have found that it may me related to JIT, as strace
shows that only EINVAL
value is returned from arch_prctl(0x3001, /* different values there */)
. Full system trace:
https://gist.github.com/hauleth/3012efda82c3ed2e3ee1c46edb2eb639
And this does not happen when running without JIT?
I am still trying to build OTP 24.0-rc2 without JIT. I tried to build version with both, but it failed #4722
Ok, I have found that it may me related to JIT, as strace shows that only EINVAL value is returned from arch_prctl(0x3001, /* different values there */). Full system trace:
That's far too early to have anything to do with JIT.
I am still trying to build OTP 24.0-rc2 without JIT. I tried to build version with both, but it failed #4722
Try building with ./configure --disable-jit
ping
I wonder if this has got something to do with Linux's "interesting" handling of IPv6 scoped addresses?
Can you give dead simple example of how it is supposed to work. I try the following; what am I doing wrong?
$ systemd-socket-activate -l [::]:12345 --inetd -a cat
Listening on [::]:12345 as 3.
from another shell:
$ nc 127.0.0.1 12345
hello
hello
^C
and get this in shell 1:
Communication attempt on fd 3.
Connection from 127.0.0.1:35178 to [::ffff:127.0.0.1]:12345
Spawned cat (cat) as PID 6304.
Execing cat (cat)
Child 6304 died with code 0
So far so good. But how about IPv6?
$ nc ::1 12345
nothing happens
^C
If I change the argument to systemd-... into -l [::1]:12345
I get immediate connection refused when trying to connect to 127.0.0.1 12345, and nothing happens for ::1 12345.
So, how is this supposed to work?
@IngelaAndin @bmk https://gist.github.com/hauleth/086f94d31eb260f84880caabf42b7446
@RaimoNiskanen both connections works for me (systemd 245 on Ubuntu 20.04 LTS). However there is huge difference between using inetd
-compatible behaviour and "new-style daemons" behaviour. In former the socket connection is accepted by the systemd (in case of TCP sockets), and then there is new process spawned for each connection. In latter case systemd only opens port, and then passes it as a raw file descriptor to the spawned process, since then the process is fully responsible for accepting and responding to the socket connections in whatever manner they seem to be feasible.
I have a hard time figuring out how this systemd stuff looks for the Erlang node. What is it that does not work, from the Erlang node's point of view?
For starters, I have still not managed to create a connection, with Erlang or with any tool, over IPv6 loopback in any way; and I suspect the scope id is the problem.
How is this raw file descriptor passed to the spawned process, exactly, and how is it used in your Erlang code. We need to minimize this example.
I have systemd 237 on Ubuntu 18.04 LTS.
@RaimoNiskanen the way it works is pretty simple in fact. What doesn't from Erlang viewpoint is using existing, bounded, file descriptor with gen_tcp
when using socket
backend (it works perfectly fine with default inet
backend).
How is this raw file descriptor passed to the spawned process, exactly
It is super simple from C viewpoint (simplified "pseudocodish" solution):
int socket_fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP);
bind(socket_fd, (struct sockaddr *)&addr, sizeof(addr));
// fork current process
dup2(socket_fd, 3);
setenv("LISTEN_FDS", "1", 1); // amount of sockets passed via the environment, in this simplified case - always 1
exec(prog_path, prog_args);
Now in the spawned process (in our case Erlang) we should be able to do:
FDsCount = list_to_integer(os:getenv("LISTEN_FDS")),
[FD | _] = [2 + N || N <- lists:seq(1, FDsCount)],
{ok, Socket} = gen_tcp:listen(0, [{fd, FD}]).
In my systemd
library it is of course hidden by function, but that does more or less exactly what I have shown above (with small addition of also parsing LISTEN_FDNAMES
for socket passed with names, which is irrelevant there, and check if the OS PID match, which also is irrelevant for the discussion there).
I can try to prepare minimal C runner, that will just open socket and pass given socket to new process for sake of testing this out.
Simplest C runner that can provide behaviour similar to what systemd does:
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <strings.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
int fd;
struct sockaddr_in6 addr;
char pid[255] = {0};
bzero((char *) &addr, sizeof(addr));
addr.sin6_family = AF_INET6;
addr.sin6_addr = in6addr_any;
addr.sin6_port = htons(8080);
fd = socket(AF_INET6, SOCK_STREAM, 0);
if (fd == -1) {
perror("Failed opening socket");
return 1;
}
if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("Failed binding socket");
exit(1);
}
if (listen(fd, 10) < 0) {
perror("Failed to listen");
exit(1);
}
if (dup2(fd, 3) < 0) {
perror("Cannot duplicate FD");
exit(1);
}
sprintf(pid, "%u", getpid());
setenv("LISTEN_FDS", "1", 1);
setenv("LISTEN_FDNAMES", "", 1);
setenv("LISTEN_PID", pid, 1);
execv(argv[1], &argv[1]);
// Not reached on success
perror("Cannot start process");
exit(1);
}
It is called as ./a.out $(which erl)
and then you have 1 IPv6 TCP socket passed as FD3 to the Erlang shell. So it should be possible to do gen_tcp:listen(0, [{fd, 3}, inet6])
within shell. It will be successful when using default inet_backend
but will return {error,einval}
when using socket
backend.
I have tested it on OTP 23 and it successfully open sockets, however gen_tcp:accept/1
never returns. With default backend everything works as expected.
It is worth mentioning that it isn't problem in the socket
module itself, as I can successfully run socket:open(3)
and accept
/recv
content of the socket. It seems like problem lies somewhere in the backing module for gen_tcp
.
Some further debugging showed that the problem comes probably from the fact that when we use inet_backend = socket
then gen_tcp_socket
will try to bind
the socket, which will obviously fail, as the socket is already bound:
> gen_tcp:listen(0, [{fd,3}, inet6]).
(<0.90.0>) call gen_tcp_socket:listen(0,[{fd,3},inet6]) ({erl_eval,do_apply,6})
(<0.102.0>) call gen_tcp_socket:init({open,inet6,#{fd => 3},<0.90.0>}) ({gen_statem,
init_it,
6})
(<0.102.0>) call gen_tcp_socket:callback_mode() ({gen_statem,
loop_state_callback,11})
(<0.102.0>) call gen_tcp_socket:handle_event({call,{<0.90.0>,#Ref<0.2888407712.2773745665.251950>}},{setopts,[{start_opts,[{timeout,infinity}]},{active,true}]},connect,{{params,{'$socket',#Ref<0.2888407712.2773876737.251947>},
<0.90.0>,#Ref<0.2888407712.2773745665.251946>},
#{active => true,buffer => <<>>,delay_send => false,deliver => term,
exit_on_close => true,header => 0,line_delimiter => 10,mode => list,
packet => raw,packet_size => 67108864,send_timeout => infinity,
send_timeout_close => false,show_econnreset => false,start_opts => [],
type => undefined}}) ({gen_statem,loop_state_callback,11})
(<0.102.0>) call gen_tcp_socket:handle_event({call,{<0.90.0>,#Ref<0.2888407712.2773745665.251952>}},{bind,#{addr => {0,0,0,0,0,0,0,0},family => inet6,port => 0}},connect,{{params,{'$socket',#Ref<0.2888407712.2773876737.251947>},
<0.90.0>,#Ref<0.2888407712.2773745665.251946>},
#{active => true,buffer => <<>>,delay_send => false,deliver => term,
exit_on_close => true,header => 0,line_delimiter => 10,mode => list,
packet => raw,packet_size => 67108864,send_timeout => infinity,
send_timeout_close => false,show_econnreset => false,
start_opts => [{timeout,infinity}],
type => undefined}}) ({gen_statem,loop_state_callback,11})
(<0.102.0>) call gen_tcp_socket:handle_event({call,{<0.90.0>,#Ref<0.2888407712.2773745665.251953>}},close,connect,{{params,{'$socket',#Ref<0.2888407712.2773876737.251947>},
<0.90.0>,#Ref<0.2888407712.2773745665.251946>},
#{active => true,buffer => <<>>,delay_send => false,deliver => term,
exit_on_close => true,header => 0,line_delimiter => 10,mode => list,
packet => raw,packet_size => 67108864,send_timeout => infinity,
send_timeout_close => false,show_econnreset => false,
start_opts => [{timeout,infinity}],
type => undefined}}) ({gen_statem,loop_state_callback,11})
(<0.102.0>) call gen_tcp_socket:terminate(normal,closed,{{params,{'$socket',#Ref<0.2888407712.2773876737.251947>},
<0.90.0>,#Ref<0.2888407712.2773745665.251946>},
#{active => false,buffer => <<>>,delay_send => false,deliver => term,
exit_on_close => true,header => 0,line_delimiter => 10,mode => list,
packet => raw,packet_size => 67108864,send_timeout => infinity,
send_timeout_close => false,show_econnreset => false,
start_opts => [{timeout,infinity}],
type => undefined}}) ({gen_statem,terminate,7})
{error,einval}
That may be the source of weird behaviour when using different backend.
This is strange. I could have sworn we fixed this...
It was fixed, for the inet
backend, and also for the socket
backend (we thought). connect
should have worked since there a bind address of undefined
was handled correctly and the fix was to change the default bind address from any
to undefined
, but we forgot to implement the special treatment implemented in the inet
backend for listen
, that ignores the bind address (default any
) when an {fd,Fd}
option is used.
We'll merge your PR shortly.
It seems that it is still occurs on OTP 24.1.4:
$ systemd-socket-activate -l 8888 --fdname=tcp -E ERL_FLAGS='-kernel inet_backend socket' erl
Listening on [::]:8888 as 3.
Communication attempt on fd 3.
Execing erl (erl)
Erlang/OTP 24 [erts-12.1.4] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:1] [jit]
Eshell V12.1.4 (abort with ^G)
1> dbg:tracer().
{ok,<0.82.0>}
2> dbg:tp(gen_tcp_socket, cx).
{ok,[{matched,nonode@nohost,28},{saved,cx}]}
3> dbg:p(all, [call, return_to]).
{ok,[{matched,nonode@nohost,42}]}
4> gen_tcp:listen(0, [{fd, 3}, inet6]).
(<0.80.0>) call gen_tcp_socket:listen(0,[{fd,3},inet6]) ({erl_eval,do_apply,6})
(<0.87.0>) call gen_tcp_socket:init({open,inet6,#{fd => 3},<0.80.0>}) ({gen_statem,
init_it,
6})
(<0.87.0>) returned from gen_tcp_socket:init/1 -> {ok,connect,
{{params,
{'$socket',
#Ref<0.1210326129.4165599233.188528>},
<0.80.0>,
#Ref<0.1210326129.4165468161.188523>},
#{active => false,
buffer => <<>>,
delay_send => false,
deliver => term,
exit_on_close => true,
header => 0,
line_delimiter => 10,
mode => list,
packet => raw,
packet_size => 67108864,
send_timeout => infinity,
send_timeout_close =>
false,
show_econnreset => false,
start_opts => [],
type => undefined}}}
(<0.87.0>) call gen_tcp_socket:callback_mode() ({gen_statem,
loop_state_callback,11})
(<0.87.0>) returned from gen_tcp_socket:callback_mode/0 -> handle_event_function
(<0.87.0>) call gen_tcp_socket:handle_event({call,{<0.80.0>,#Ref<0.1210326129.4165468161.188533>}},{setopts,[{start_opts,[]},{active,true}]},connect,{{params,{'$socket',#Ref<0.1210326129.4165599233.188528>},
<0.80.0>,#Ref<0.1210326129.4165468161.188523>},
#{active => false,buffer => <<>>,delay_send => false,deliver => term,
exit_on_close => true,header => 0,line_delimiter => 10,mode => list,
packet => raw,packet_size => 67108864,send_timeout => infinity,
send_timeout_close => false,show_econnreset => false,start_opts => [],
type => undefined}}) ({gen_statem,loop_state_callback,11})
(<0.87.0>) returned from gen_tcp_socket:handle_event/4 -> {keep_state,
{{params,
{'$socket',
#Ref<0.1210326129.4165599233.188528>},
<0.80.0>,
#Ref<0.1210326129.4165468161.188523>},
#{active => true,
buffer => <<>>,
delay_send =>
false,
deliver => term,
exit_on_close =>
true,
header => 0,
line_delimiter =>
10,
mode => list,
packet => raw,
packet_size =>
67108864,
send_timeout =>
infinity,
send_timeout_close =>
false,
show_econnreset =>
false,
start_opts => [],
type =>
undefined}},
[{reply,
{<0.80.0>,
#Ref<0.1210326129.4165468161.188533>},
ok}]}
(<0.87.0>) call gen_tcp_socket:handle_event({call,{<0.80.0>,#Ref<0.1210326129.4165468161.188534>}},{bind,#{addr => any,family => inet6,port => 0}},connect,{{params,{'$socket',#Ref<0.1210326129.4165599233.188528>},
<0.80.0>,#Ref<0.1210326129.4165468161.188523>},
#{active => true,buffer => <<>>,delay_send => false,deliver => term,
exit_on_close => true,header => 0,line_delimiter => 10,mode => list,
packet => raw,packet_size => 67108864,send_timeout => infinity,
send_timeout_close => false,show_econnreset => false,start_opts => [],
type => undefined}}) ({gen_statem,loop_state_callback,11})
(<0.87.0>) returned from gen_tcp_socket:handle_event/4 -> {keep_state_and_data,
[{reply,
{<0.80.0>,
#Ref<0.1210326129.4165468161.188534>},
{error,einval}}]}
(<0.87.0>) call gen_tcp_socket:handle_event({call,{<0.80.0>,#Ref<0.1210326129.4165468161.188537>}},close,connect,{{params,{'$socket',#Ref<0.1210326129.4165599233.188528>},
<0.80.0>,#Ref<0.1210326129.4165468161.188523>},
#{active => true,buffer => <<>>,delay_send => false,deliver => term,
exit_on_close => true,header => 0,line_delimiter => 10,mode => list,
packet => raw,packet_size => 67108864,send_timeout => infinity,
send_timeout_close => false,show_econnreset => false,start_opts => [],
type => undefined}}) ({gen_statem,loop_state_callback,11})
(<0.87.0>) returned from gen_tcp_socket:handle_event/4 -> {next_state,closed,
{{params,
{'$socket',
#Ref<0.1210326129.4165599233.188528>},
<0.80.0>,
#Ref<0.1210326129.4165468161.188523>},
#{active => false,
buffer => <<>>,
delay_send =>
false,
deliver => term,
exit_on_close =>
true,
header => 0,
line_delimiter =>
10,
mode => list,
packet => raw,
packet_size =>
67108864,
send_timeout =>
infinity,
send_timeout_close =>
false,
show_econnreset =>
false,
start_opts => [],
type =>
undefined}},
[{reply,
{<0.80.0>,
#Ref<0.1210326129.4165468161.188537>},
ok}]}
(<0.87.0>) call gen_tcp_socket:terminate(normal,closed,{{params,{'$socket',#Ref<0.1210326129.4165599233.188528>},
<0.80.0>,#Ref<0.1210326129.4165468161.188523>},
#{active => false,buffer => <<>>,delay_send => false,deliver => term,
exit_on_close => true,header => 0,line_delimiter => 10,mode => list,
packet => raw,packet_size => 67108864,send_timeout => infinity,
send_timeout_close => false,show_econnreset => false,start_opts => [],
type => undefined}}) ({gen_statem,terminate,7})
{error,einval}
(<0.87.0>) returned from gen_tcp_socket:terminate/3 -> void
(<0.80.0>) returned from gen_tcp_socket:listen/2 -> {error,einval}
5>
It seems that in this code:
listen_open(Domain, ListenOpts, StartOpts, ExtraOpts, Backlog, BindAddr) ->
case
start_server(Domain, [{timeout, infinity} | StartOpts], ExtraOpts)
of
{ok, Server} ->
ErrRef = make_ref(),
try
Setopts =
default_active_true(
[{start_opts, StartOpts} |
setopts_opts(ErrRef, ListenOpts)]),
ok(ErrRef, call(Server, {setopts, Setopts})),
ok(ErrRef, call_bind(Server, default_any(Domain, BindAddr))),
Socket = val(ErrRef, call(Server, {listen, Backlog})),
{ok, ?MODULE_socket(Server, Socket)}
catch
throw : {ErrRef, Reason} ->
close_server(Server),
?badarg_exit({error, Reason})
end;
{error, {shutdown, Reason}} ->
?badarg_exit({error, Reason});
{error, _} = Error ->
?badarg_exit(Error)
end.
The call to default_any/2
will not return undefined
if Domain
is inet*
:
default_any(Domain, undefined = Undefined) ->
if
Domain =:= inet;
Domain =:= inet6 ->
#{family => Domain,
addr => any,
port => 0};
true ->
Undefined
end;
default_any(_Domain, BindAddr) ->
BindAddr.
Which is different behaviour from "classic" inet
backend and against expected behaviour, as I am trying to use INET6 socket that is already bound by the supervisor.
A combination I did not get right...
Thank you for fixing this issue!