ECT built from source for Apple Silicon is slower than release x86/64 build
Adreitz opened this issue · 6 comments
I have an M2 Max MBP running 13.2.1. I followed your directions to git pull the source tree and build it. The resulting executable appears to be Apple Silicon native:
build % file /Users/XXXX/Downloads/ECT/Efficient-Compression-Tool/build/ect
/Users/XXXX/Downloads/ECT/Efficient-Compression-Tool/build/ect: Mach-O 64-bit executable arm64
However, for the few PNG files I have spot-checked, the execution time is slightly slower than the release 0.9.4 Mac executable I downloaded directly from github. I measured the two with time, the second being the x86/64 version which is in my PATH:
/Users/XXXX/Downloads/ECT/Efficient-Compression-Tool/build/ect -9 --reuse 10.01s user 0.03s system 99% cpu 10.033 total
ect -9 --reuse 9.40s user 0.04s system 99% cpu 9.440 total
Is ECT currently unoptimized for AS? Or did I do something wrong in my build? I've attached the executable and the example file I tested with.
ect.zip
That's an interesting thing to look into – while ECT was developed and tuned on x86, it does include ARM SIMD code in most spaces where there is x86 SIMD, so the results are a bit surprising. I'd imagine that a part of this is based on M1/M2's x86 emulation being very powerful. I will investigate what might be causing this, if there's any areas where ARM is not performing well, but it will be some time before I get to it.
Thanks in advance. I initially tried a small image (just because I had it at hand) and, in order to give time more to work with I set the iterations high at -9999. It still took less than a second, but it was slightly faster with the AS-native binary than the Intel binary -- maybe about 10%. It's interesting that this situation was reversed. Perhaps it might give a clue to what section of code is least AS-optimized.
I ran x86_64 and arm64 binaries on an M1 Pro-based Mac and the native binary was between 3% and 10% faster, with a larger speedup on image data compared to text.
So far, I can't see any parts of the code where the native binary would be slower. With the small speedup, it is likely that the emulation is working very well for these workloads.
I'm happy to do some more digging if you can point me to a file where the native binary is consistently slower. For reference, I have added an arm64 binary to the release page, that should eliminate differences based on e.g. compiler versions.
Thanks for coming back to this and providing your own build. I don't know why, but your build works better for me. I tested with a fairly large palettized image:
Official Intel | My ARM | Official ARM | |
---|---|---|---|
-9 --allfilters --pal_sort=120 | 1081.34s | 1183.41s | 941.63s |
-9 --allfilters | 23.68s | 26.05s | 20.78s |
-9 --pal_sort=120 | 99.86s | 109.34s | 86.25s |
-9 | 2.22s | 2.37s | 1.91s |
--reuse -9 | 2.01s | 2.18s | 1.71s |
--reuse -999 | 41.85s | 43.63s | 35.96s |
I wanted to see if any particular part of the code was more or less optimized than others. It looks like the tests with --allfilters had slightly worse scaling between Intel and ARM (~87% vs 85-86% for the other tests), but I don't know how statistically significant it is. And, of course, it might be different for other images.
I did comparisons with a couple other images. A large full-color image scaled at about 90% ARM vs Intel and my build was about 5% worse than the Intel build, while a smaller palettized image was about 5% better with my build than the Intel build. Your ARM build was still the best.
[Edit] Fixed my brainfart with AMD/ARM. [/Edit]
You're welcome! It's interesting that there's a noticeable difference between the two different ARM builds. Did you do anything special for the build or have you modified your compiler/Xcode setup? Otherwise it could also just be different compiler versions, although the size of the difference is still surprising.
I haven't done anything with Xcode and, while I know how to follow directions to build code, I am not a programmer and don't have a great understanding of Xcode. I have installed a number of packages with MacPorts, mostly to support running Stable Diffusion. I'm not sure which would get used automatically in place of OS-provided packages, though:
ade @0.1.2a_0 (active)
advancecomp @2.3_0 (active)
aom @3.6.0_0 (active)
asciidoc @10.2.0_2 (active)
autoconf @2.71_1 (active)
autoconf-archive @2023.02.20_0 (active)
automake @1.16.5_0 (active)
bison @3.8.2_2 (active)
bison-runtime @3.8.2_0 (active)
brotli @1.0.9_2 (active)
bzip2 @1.0.8_0 (active)
cairo @1.17.4_0+quartz+x11
cairo @1.17.6_0+quartz+x11 (active)
cargo @0.69.1_0 (active)
cargo-c @0.9.15_0
cargo-c @0.9.16_0 (active)
cctools @949.0.1_2+xcode (active)
clang-15 @15.0.7_1+analyzer+libstdcxx (active)
clang_select @2.2_1 (active)
cmake @3.24.4_0 (active)
cmake-bootstrap @3.9.6_0 (active)
coreutils @9.2_0 (active)
curl @8.0.1_0+http2+ssl (active)
curl-ca-bundle @8.0.1_0 (active)
cython_select @0.1_2 (active)
dav1d @1.1.0_0 (active)
db48 @4.8.30_5 (active)
docbook-xml @5.0_3 (active)
docbook-xml-4.1.2 @5.0_1 (active)
docbook-xml-4.2 @5.0_1 (active)
docbook-xml-4.3 @5.0_1 (active)
docbook-xml-4.4 @5.0_1 (active)
docbook-xml-4.5 @5.0_1 (active)
docbook-xml-5.0 @5.0_1 (active)
docbook-xsl-nons @1.79.2_0 (active)
docutils_select @0.1_1 (active)
expat @2.5.0_0 (active)
ffmpeg @4.4.2_6+gpl2 (active)
findutils @4.9.0_0 (active)
flex @2.6.4_0 (active)
fontconfig @2.14.1_0
fontconfig @2.14.2_0 (active)
fop @1.1_1 (active)
freetype @2.12.1_0 (active)
fribidi @1.0.12_0 (active)
gd2 @2.3.3_3+x11 (active)
gdbm @1.23_0 (active)
gdk-pixbuf2 @2.42.10_0 (active)
gettext @0.21.1_0 (active)
gettext-runtime @0.21.1_0 (active)
gettext-tools-libs @0.21.1_0 (active)
ghostscript @9.56.1_1+x11 (active)
giflib @4.2.3_0 (active)
giflib5 @5.2.1_3 (active)
gifsicle @1.93_0 (active)
git @2.40.0_0+credential_osxkeychain+diff_highlight+doc+pcre+perl5_34 (active)
glib2 @2.70.5_1+x11 (active)
gmake @4.4.1_0 (active)
gmp @6.2.1_1 (active)
gnutls @3.7.9_2 (active)
gobject-introspection @1.72.0_1 (active)
gperf @3.1_0 (active)
graphite2 @1.3.14_0 (active)
graphviz @8.0.1_0+pangocairo+x11 (active)
grep @3.10_0 (active)
groff @1.22.4_6 (active)
gsed @4.9_1 (active)
gtk-doc @1.32_2+python310 (active)
gts @0.7.6-20121130_1 (active)
harfbuzz @6.0.0_0 (active)
help2man @1.49.3_0 (active)
highway @1.0.4_0 (active)
icu @72.1_0 (active)
ilmbase @2.3.0_1 (active)
itstool @2.0.7_2+python310 (active)
jasper @4.0.0_0 (active)
jbig2dec @0.19_0 (active)
jbigkit @2.1_0 (active)
kerberos5 @1.20.1_0 (active)
lame @3.100_2 (active)
lcms2 @2.14_0 (active)
ld64 @3_4+ld64_xcode (active)
ld64-xcode @2_4 (active)
lerc @4.0.0_1 (active)
libarchive @3.6.2_1 (active)
libass @0.17.1_0 (active)
libavif @0.10.1_4 (active)
libb2 @0.98.1_1 (active)
libbluray @1.3.4_0 (active)
libcomerr @1.47.0_0 (active)
libcxx @5.0.1_5 (active)
libde265 @1.0.11_0 (active)
libedit 20221030-3.1_0 (active)
libevent @2.1.12_2 (active)
libffi @3.4.4_0 (active)
libgit2 @1.5.2_0+threadsafe (active)
libheif @1.15.2_0 (active)
libiconv @1.17_0 (active)
libidn @1.41_0 (active)
libidn2 @2.3.4_1 (active)
libjpeg-turbo @2.1.5.1_0 (active)
libjxl @0.8.1_0 (active)
libLASi @1.1.3_1 (active)
libmodplug @0.8.9.0_0 (active)
libnetpbm @11.01.00_0 (active)
libogg @1.3.5_1 (active)
libomp @16.0.0_0 (active)
libopus @1.3.1_0 (active)
libpaper @1.1.28_0 (active)
libpixman @0.38.4_0 (active)
libpng @1.6.39_0 (active)
libpsl @0.21.2-20230117_0 (active)
libquirc @1.1_0 (active)
librsvg @2.54.5_0 (active)
libsdl2 @2.26.4_0 (active)
libssh2 @1.10.0_0 (active)
libtasn1 @4.19.0_0 (active)
libtextstyle @0.21.1_0 (active)
libtheora @1.1.1_3 (active)
libtool @2.4.7_0 (active)
libunistring @1.1_0 (active)
libuv @1.44.2_0 (active)
libvidstab @1.1.1_0 (active)
libvorbis @1.3.7_0 (active)
libvpx @1.13.0_0 (active)
libxml2 @2.10.3_1 (active)
libxslt @1.1.37_1 (active)
libyaml @0.2.5_0 (active)
libyuv @20220812_0 (active)
links @2.28_0 (active)
llvm-15 @15.0.7_0 (active)
llvm_select @2_1 (active)
lmdb @0.9.29_0 (active)
lz4 @1.9.4_0 (active)
lzip @1.23_0 (active)
lzo2 @2.10_0 (active)
m4 @1.4.19_1 (active)
nasm @2.16.01_0 (active)
ncurses @6.4_0 (active)
netpbm @11.01.00_1+x11 (active)
nettle @3.8.1_0 (active)
nghttp2 @1.52.0_0 (active)
opencv4 @4.6.0_2 (active)
openexr @2.3.0_2 (active)
openjpeg @2.5.0_1 (active)
openssl @3_10 (active)
openssl3 @3.1.0_2 (active)
ossp-uuid @1.6.2_13+perl5_34 (active)
p5.34-authen-sasl @2.160.0_0 (active)
p5.34-b-cow @0.7.0_0 (active)
p5.34-canary-stability @2013_0 (active)
p5.34-cgi @4.560.0_0 (active)
p5.34-clone @0.460.0_0 (active)
p5.34-common-sense @3.750.0_0 (active)
p5.34-compress-raw-bzip2 @2.204.0_0 (active)
p5.34-compress-raw-zlib @2.204.0_0 (active)
p5.34-digest-hmac @1.40.0_0 (active)
p5.34-digest-sha1 @2.130.0_4 (active)
p5.34-encode @3.190.0_0 (active)
p5.34-encode-locale @1.50.0_0 (active)
p5.34-error @0.170.290_0 (active)
p5.34-file-slurper @0.14.0_0 (active)
p5.34-getopt-long @2.540.0_0 (active)
p5.34-gssapi @0.280.0_3 (active)
p5.34-html-parser @3.810.0_0 (active)
p5.34-html-tagset @3.200.0_4 (active)
p5.34-http-date @6.50.0_0 (active)
p5.34-http-message @6.440.0_0 (active)
p5.34-io-compress @2.204.0_0 (active)
p5.34-io-compress-brotli @0.4.1_1 (active)
p5.34-io-html @1.4.0_0 (active)
p5.34-io-socket-ssl @2.81.0_0 (active)
p5.34-json @4.100.0_0 (active)
p5.34-json-xs @4.30.0_0 (active)
p5.34-locale-gettext @1.70.0_1 (active)
p5.34-lwp-mediatypes @6.40.0_0 (active)
p5.34-mozilla-ca @20221114_0 (active)
p5.34-net-libidn @0.120.0_5 (active)
p5.34-net-smtp-ssl @1.40.0_0 (active)
p5.34-net-ssleay @1.920.0_0 (active)
p5.34-pod-escapes @1.70.0_0 (active)
p5.34-pod-simple @3.430.0_0 (active)
p5.34-sub-uplevel @0.280.0_0 (active)
p5.34-term-readkey @2.380.0_0 (active)
p5.34-test-cpan-meta @0.250.0_0 (active)
p5.34-test-cpan-meta-json @0.160.0_0 (active)
p5.34-test-exception @0.430.0_0 (active)
p5.34-test-nowarnings @1.60.0_0 (active)
p5.34-test-pod @1.520.0_0 (active)
p5.34-test-simple @1.302.194_0 (active)
p5.34-test-warn @0.370.0_0 (active)
p5.34-time-hires @1.976.400_0 (active)
p5.34-time-local @1.300.0_0 (active)
p5.34-timedate @2.330.0_0 (active)
p5.34-types-serialiser @1.10.0_0 (active)
p5.34-uri @5.170.0_0 (active)
p5.34-xsloader @0.240.0_0 (active)
p11-kit @0.24.1_0 (active)
pango @1.50.7_0+quartz+x11 (active)
pcre @8.45_0 (active)
pcre2 @10.42_0 (active)
perl5 @5.34.1_0+perl5_34 (active)
perl5.34 @5.34.1_0 (active)
pip_select @0.1_3 (active)
pkgconfig @0.29.2_0 (active)
popt @1.18_1 (active)
protobuf3-cpp @3.19.3_0 (active)
psutils @p17_1 (active)
py310-anytree @2.8.0_1 (active)
py310-build @0.10.0_0 (active)
py310-cython @0.29.34_0 (active)
py310-docutils @0.19_0 (active)
py310-flatbuffers @23.3.3_0 (active)
py310-installer @0.7.0_0 (active)
py310-jinja2 @3.1.2_0 (active)
py310-libxml2 @2.10.2_0 (active)
py310-lxml @4.9.1_0 (active)
py310-mako @1.2.4_0 (active)
py310-markdown @3.4.1_0 (active)
py310-markupsafe @2.1.1_0 (active)
py310-packaging @23.0_0 (active)
py310-pep517 @0.13.0_0
py310-pip @23.0.1_0 (active)
py310-protobuf3 @3.19.3_0 (active)
py310-pygments @2.14.0_0 (active)
py310-pyproject_hooks @1.0.0_0 (active)
py310-roman @3.3_0 (active)
py310-setuptools @67.6.1_0 (active)
py310-six @1.16.0_0 (active)
py310-smartypants @2.0.1_0 (active)
py310-toml @0.10.2_0 (active)
py310-tomli @2.0.1_0 (active)
py310-typogrify @2.0.7_0 (active)
py310-wheel @0.40.0_0 (active)
py310-yaml @6.0_0 (active)
py311-mako @1.2.4_0 (active)
py311-markdown @3.4.1_0 (active)
py311-markupsafe @2.1.1_0 (active)
py311-setuptools @67.6.1_0 (active)
pygments_select @0.1_1 (active)
python3_select @0.0_3 (active)
python310 @3.10.10_0+lto+optimizations (active)
python311 @3.11.2_0+lto+optimizations (active)
python_select @0.3_10 (active)
rav1e @0.6.3_0 (active)
re2c @3.0_0 (active)
readline @8.2.001_0 (active)
rsync @3.2.7_0 (active)
rust @1.68.2_0 (active)
shared-mime-info @2.2_0 (active)
soxr @0.1.3_0 (active)
speex @1.2.1_0 (active)
speexdsp @1.2.1_0 (active)
sqlite3 @3.41.2_0 (active)
svt-av1 @1.4.1_0 (active)
texinfo @7.0.3_0 (active)
tiff @4.5.0_0 (active)
uchardet @0.0.8_0 (active)
urw-fonts @1.0.7pre44_0 (active)
util-linux @2.38.1_0 (active)
vala @0.56.5_0 (active)
webp @1.3.0_0 (active)
wget @1.21.3_1+gnutls (active)
x264 @20191217_0 (active)
x265 @3.4_2 (active)
xar @1.8.0.494.81.1_0 (active)
Xft2 @2.3.7_0 (active)
xmlcatmgr @2.2_1 (active)
xmlto @0.0.28_5 (active)
xorg-libice @1.1.1_0 (active)
xorg-libpthread-stubs @0.4_0 (active)
xorg-libsm @1.2.4_0 (active)
xorg-libX11 @1.8.4_0 (active)
xorg-libXau @1.0.11_0 (active)
xorg-libXaw @1.0.15_0 (active)
xorg-libxcb @1.15_0+python311 (active)
xorg-libXdmcp @1.1.4_0 (active)
xorg-libXext @1.3.5_0 (active)
xorg-libXmu @1.1.4_0 (active)
xorg-libXt @1.2.1_0 (active)
xorg-util-macros @1.20.0_0 (active)
xorg-xcb-proto @1.15.2_0+python311 (active)
xorg-xcb-util @0.4.1_0 (active)
xorg-xorgproto @2022.2_0 (active)
xorg-xtrans @1.4.0_0 (active)
xpm @3.5.15_0 (active)
xrender @0.9.11_0 (active)
XviD @1.3.7_0 (active)
xxhashlib @0.8.1_2 (active)
xz @5.4.2_0 (active)
yasm @1.3.0_0 (active)
zimg @3.0.4_0 (active)
zlib @1.2.13_0 (active)
zstd @1.5.5_0 (active)
zvbi @0.2.35_3 (active)