erlang/otp

OTP26 `erlang:port_command` breaking change?

mkuratczyk opened this issue · 4 comments

Describe the bug
OTP26.0-rc2 includes a change that optimises erlang:port_command. This change, while certainly appreciated!, breaks applications that are using erlang:port_command as a workaround to the issue of selective receive in gen_tcp as explained in https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbit_common/src/rabbit_writer.erl#L388-L405. RabbitMQ can't handle any connections if started on OTP26 (of course at this point we are not declaring it as OTP26-compatible, so users shouldn't do that, but we are looking for a way to ship releases that would work on both 25 and 26 and a compile-time check is insufficient here).

Using erlang:port_command like that might be considered unsupported, but we know it's at least:
RabbitMQ
emqtt
Erlang PostgreSQL Database Client

Therefore, this is a breaking change for at least some applications out there.

We'd appreciate some help or guidance how to resolve this.

  1. Is the whole erlang:port_command instead of gen_tcp:send a performance myth that should be retired?
  2. Can erlang:port_command in OTP26 be made backwards compatible, so that apps like the above still work?
  3. If not, I guess at least it should be added to the Potential Incompatibilities list

To Reproduce
side note: the following test case was written by AI :)

-module(send_tcp).

-export([start/0]).

start() ->
    {ok, Socket} = gen_tcp:connect("localhost", 3456, [binary, {packet, 0}]),
    erlang:port_command(Socket, "Hello from erlang:port_command"),
    gen_tcp:close(Socket).

This exits with einval (I have nc -l 3456 running in another terminal):

1> send_tcp:start().
** exception exit: einval

Expected behavior
Ideally: erlang:port_command would be backwards compatible, while still optimised, but I guess that's not an option

Alternatively:

  1. It'd behave the old way in some cases (backwards compatible, but not optimised in some cases?)
  2. Backport the optimisation to OTP25 so that the updated code works well with older OTP versions
  3. Mention this change in the potential incompatibilities list

Affected versions
OTP26.0-rc2 and newer
the commit that changed the behaviour: f17f802#diff-86e7e31a3a23b92ea754ad259d8ffd6eb347deb522218fae1803878bf9d15e8b

Additional context
slack thread:
https://erlanger.slack.com/archives/C055DJA49/p1681483290185709

This change does not change port_command() at all. This is a change in the internal protocol between prim_inet and the inet-driver which also is completely backwards compatible. I don't see why it should be mentioned as being incompatible when it is not. Misusing the functionality by issuing port_command() operations like that on your own is not supported.

We'd appreciate some help or guidance how to resolve this.

How about?

Rel = list_to_integer(erlang:system_info(otp_release)),
F = if
        Rel >= 26 -> fun gen_tcp:send/2;
        Rel < 25 -> fun workaround_module:send/2
    end,
persistent_term:put(gen_tcp_send_workaround, F).

In epgsql direct port call was implemented (the idea was borrowed from RabbitMQ) because epgsql tends to accumulate a large message queue (due to active=true by default). And pre-OTP26 gen_tcp:send had a performance issues when the process calling it has a long message queue.

I think in epgsql we will use -if(?OTP_RELEASE >= 26). ... use normal gen_tcp:send -else. ... use port_command ... -endif..

However even on pre-OTP26 gen_tcp:send should become less of a problem, since we now support active=N mode

Thanks for the suggestion with persistent_term. We've been testing it since yesterday and it does the trick and we can't find any measurable cost of this lookup.

I'm happy to close the issue. I hope not too many apps will be in a similar position when upgrading to OTP26. We certainly need to update the network-related parts of RabbitMQ as they are 10+ years old for the most part.