erlang/otp

ELF-binaries in the `erts-15.0/bin` of Hexpm Docker images are too big (and full or zeros)

cr0t opened this issue · 6 comments

cr0t commented

Describe the bug

Originally, I wrote this post at Elixir Forum.

With Erlang/OTP 27.0 ELF-binaries of the ERTS directory of hexpm/elixir Docker images became much bigger and padded with zeros.

To Reproduce

Compare sizes of files:

docker run --rm -it hexpm/elixir:1.16.2-erlang-26.2.5-debian-bookworm-20240513 ls -lah /usr/local/lib/erlang/erts-14.2.5/bin/
# ...
total 5.7M
drwxr-xr-x 2 root root 4.0K May 14 02:23 .
drwxr-xr-x 6 root root 4.0K May 14 02:23 ..
-rwxr-xr-x 1 root root 4.4M May 14 02:23 beam.smp
-rwxr-xr-x 1 root root  67K May 14 02:23 ct_run
-rwxr-xr-x 1 root root  67K May 14 02:23 dialyzer
-rwxr-xr-x 1 root root  67K May 14 02:23 dyn_erl
-rwxr-xr-x 1 root root  67K May 14 02:23 epmd
-rwxr-xr-x 1 root root 1.5K May 14 02:16 erl
-rwxr-xr-x 1 root root 1.5K May 14 02:16 erl.src
-rwxr-xr-x 1 root root 131K May 14 02:23 erl_call
-rwxr-xr-x 1 root root  67K May 14 02:23 erl_child_setup
-rwxr-xr-x 1 root root 132K May 14 02:23 erlc
-rwxr-xr-x 1 root root  68K May 14 02:23 erlexec
-rwxr-xr-x 1 root root  67K May 14 02:23 escript
-rwxr-xr-x 1 root root  67K May 14 02:23 heart
-rwxr-xr-x 1 root root  67K May 14 02:23 inet_gethost
-rwxr-xr-x 1 root root  67K May 14 02:23 run_erl
-rwxr-xr-x 1 root root 1.8K May 14 02:16 start
-rwxr-xr-x 1 root root 1.8K May 14 02:16 start.src
-rwxr-xr-x 1 root root 1.3K May 14 02:16 start_erl.src
-rwxr-xr-x 1 root root  67K May 14 02:23 to_erl
-rwxr-xr-x 1 root root  67K May 14 02:23 typer
-rwxr-xr-x 1 root root 259K May 14 02:23 yielding_c_fun

# vs.

docker run --rm -it hexpm/elixir:1.17.0-erlang-27.0-debian-bookworm-20240513 ls -lah /usr/local/lib/erlang/erts-15.0/bin/
# ...
total 35M
drwxr-xr-x 2 root root 4.0K May 25 21:11 .
drwxr-xr-x 6 root root 4.0K May 25 21:11 ..
-rwxr-xr-x 1 root root 6.1M May 25 21:11 beam.smp
-rwxr-xr-x 1 root root 2.1M May 25 21:11 ct_run
-rwxr-xr-x 1 root root 2.1M May 25 21:11 dialyzer
-rwxr-xr-x 1 root root 2.1M May 25 21:11 dyn_erl
-rwxr-xr-x 1 root root 2.1M May 25 21:11 epmd
-rwxr-xr-x 1 root root 1.5K May 25 21:11 erl
-rwxr-xr-x 1 root root 1.5K May 25 21:11 erl.src
-rwxr-xr-x 1 root root 131K May 25 21:11 erl_call
-rwxr-xr-x 1 root root 2.1M May 25 21:11 erl_child_setup
-rwxr-xr-x 1 root root 2.1M May 25 21:11 erlc
-rwxr-xr-x 1 root root 2.1M May 25 21:11 erlexec
-rwxr-xr-x 1 root root 2.1M May 25 21:11 escript
-rwxr-xr-x 1 root root 2.1M May 25 21:11 heart
-rwxr-xr-x 1 root root 2.1M May 25 21:11 inet_gethost
-rwxr-xr-x 1 root root 2.1M May 25 21:11 run_erl
-rwxr-xr-x 1 root root 1.8K May 25 21:11 start
-rwxr-xr-x 1 root root 1.8K May 25 21:11 start.src
-rwxr-xr-x 1 root root 1.3K May 25 21:11 start_erl.src
-rwxr-xr-x 1 root root 2.1M May 25 21:11 to_erl
-rwxr-xr-x 1 root root 2.1M May 25 21:11 typer
-rwxr-xr-x 1 root root 2.1M May 25 21:11 yielding_c_fun

Expected behavior

In ERTS-14, the ELF-binaries are size of a few KB, while in ERTS-15 they became a few MB (~2 MB for Debian-based images, and ~6 MB for Alpine-based ones).

Affected versions

Probably, Erlang/OTP 27.0.

Additional context

There is a screenshot from my original post on Elixir Forum where it's visible that one of the ERTS-15 binaries (erlc in particular) contains lots of zero in the middle:

image

P.S. Please let me know if that is the wrong place to leave this kind of issues.

Same here, compiled with kerl on Ubuntu 24.04. The Docker image is about 100MB larger than erts14.

It is #7977 that causes this problem. Seems like the LDFLAGS are passed to all C programs and not just beam.jit.

@lexprfuncall do you have time to take a look?

Proposed fix available in #8593, if you can, please test it and make sure that it works in your environment.

cr0t commented

Hey, @garazdawi,

Thanks for such a quick reaction (and fix)!

I did some tests on my side – seems your fix works. Below is the summary of what I've done to check the changes:

cr0t@sprawl:~$ uname -a
Linux sprawl 5.15.0-112-generic #122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

cr0t@sprawl:~$ git clone git@github.com:erlang/otp.git official_otp && cd official_otp
cr0t@sprawl:~/official_otp$ git checkout -b maint-27 origin/maint-27
cr0t@sprawl:~/official_otp$ ./configure && make
# ...
cr0t@sprawl:~/official_otp$ ls -lah bin/
total 1.7M
drwxrwxr-x  3 cr0t cr0t 4.0K Jun 18 20:35 .
drwxrwxr-x 15 cr0t cr0t 4.0K Jun 18 20:23 ..
-rwxr-xr-x  1 cr0t cr0t  15K Jun 18 20:35 cerl
-rwxrwxr-x  1 cr0t cr0t 6.1M Jun 18 20:35 ct_run
-rwxrwxr-x  1 cr0t cr0t 6.1M Jun 18 20:35 dialyzer
-rwxr-xr-x  1 cr0t cr0t 1.5K Jun 18 20:35 erl
-rwxr-xr-x  1 cr0t cr0t 6.5M Jun 18 20:35 erlc
-rwxrwxr-x  1 cr0t cr0t 529K Jun 18 20:35 erl_call
-rwxrwxr-x  1 cr0t cr0t 6.1M Jun 18 20:35 escript
-rw-rw-r--  1 cr0t cr0t    0 Jun 18 20:16 .gitignore
-rw-rw-r--  1 cr0t cr0t 6.7K Jun 18 20:35 no_dot_erlang.boot
-rw-rw-r--  1 cr0t cr0t 8.6K Jun 18 20:35 no_dot_erlang.script
-rw-rw-r--  1 cr0t cr0t 6.7K Jun 18 20:35 start.boot
-rw-rw-r--  1 cr0t cr0t 6.7K Jun 18 20:35 start_clean.boot
-rw-rw-r--  1 cr0t cr0t 8.7K Jun 18 20:35 start_clean.script
-rw-rw-r--  1 cr0t cr0t 7.7K Jun 18 20:35 start_sasl.boot
-rw-rw-r--  1 cr0t cr0t  10K Jun 18 20:35 start_sasl.script
-rw-rw-r--  1 cr0t cr0t 8.7K Jun 18 20:35 start.script
-rwxrwxr-x  1 cr0t cr0t 6.1M Jun 18 20:35 typer
drwxrwxr-x  2 cr0t cr0t 4.0K Jun 18 20:26 x86_64-pc-linux-gnu

cr0t@sprawl:~$ git clone git@github.com:garazdawi/otp.git garazdawi_otp && cd garazdawi_otp
cr0t@sprawl:~/garazdawi_otp$ git checkout -b OTP-19137 origin/lukas/erts/fix-thp-options/OTP-19137
cr0t@sprawl:~/garazdawi_otp$ ./configure && make
# ...
cr0t@sprawl:~/garazdawi_otp$ ls -lah bin/
total 1.7M
drwxrwxr-x  3 cr0t cr0t 4.0K Jun 18 20:39 .
drwxrwxr-x 15 cr0t cr0t 4.0K Jun 18 20:27 ..
-rwxr-xr-x  1 cr0t cr0t  15K Jun 18 20:39 cerl
-rwxrwxr-x  1 cr0t cr0t 115K Jun 18 20:39 ct_run
-rwxrwxr-x  1 cr0t cr0t 111K Jun 18 20:39 dialyzer
-rwxr-xr-x  1 cr0t cr0t 1.5K Jun 18 20:39 erl
-rwxr-xr-x  1 cr0t cr0t 569K Jun 18 20:39 erlc
-rwxrwxr-x  1 cr0t cr0t 529K Jun 18 20:39 erl_call
-rwxrwxr-x  1 cr0t cr0t 120K Jun 18 20:39 escript
-rw-rw-r--  1 cr0t cr0t    0 Jun 18 20:17 .gitignore
-rw-rw-r--  1 cr0t cr0t 6.7K Jun 18 20:39 no_dot_erlang.boot
-rw-rw-r--  1 cr0t cr0t 8.6K Jun 18 20:39 no_dot_erlang.script
-rw-rw-r--  1 cr0t cr0t 6.7K Jun 18 20:39 start.boot
-rw-rw-r--  1 cr0t cr0t 6.7K Jun 18 20:39 start_clean.boot
-rw-rw-r--  1 cr0t cr0t 8.7K Jun 18 20:39 start_clean.script
-rw-rw-r--  1 cr0t cr0t 7.7K Jun 18 20:39 start_sasl.boot
-rw-rw-r--  1 cr0t cr0t  10K Jun 18 20:39 start_sasl.script
-rw-rw-r--  1 cr0t cr0t 8.7K Jun 18 20:39 start.script
-rwxrwxr-x  1 cr0t cr0t 110K Jun 18 20:39 typer
drwxrwxr-x  2 cr0t cr0t 4.0K Jun 18 20:31 x86_64-pc-linux-gnu

Furthermore, I've checked the erlc binaries from both compilations. Seems after fix it's not padded with zeros anymore:

image

It is #7977 that causes this problem. Seems like the LDFLAGS are passed to all C programs and not just beam.jit.

@lexprfuncall do you have time to take a look?

Sorry for not responding sooner. I think the approach of just passing the flags to pad out the text segment to the emulator certainly makes sense. Is there anyway I can help?

@lexprfuncall No worries, I think we're good for now.