marcelotduarte/cx_Freeze

PySide6.7.2 + cx_Freeze=7.1.1 + Ubuntu 20.04 very big size of build directory - not used qt plugins and *.so files duplicates many times

break11 opened this issue · 11 comments

Describe the bug

I attach simple Qt App like this:

import sys

from PySide6.QtCore import Qt
from PySide6.QtWidgets import QApplication, QLabel, QWidget


def main() -> int:
    app = QApplication(sys.argv)
    window = QWidget()
    window.setWindowTitle("Simple")
    window.setGeometry(300, 300, 300, 300)
    label = QLabel(window)
    label.setText("Hello World!")
    label.setGeometry(0, 0, 300, 300)
    label.setAlignment(Qt.AlignCenter)
    window.show()
    return app.exec()


if __name__ == "__main__":
    sys.exit(main())

After build with cx_Freeze the build folder size is 1Gb ! It is to big.

  1. Inside build folder i found many folder that not necessary in this app - many qt plugins. After i delete it build folder size became smaller - about 230 Mb. This is quite acceptable. And after deleting - app in build folder still work. I wrote small script for this deletion - it in attachment.

  2. Many *.so files in build folder has duplicate - like libicudata.so.73 - it size about 30 Mb -
    first location --- build/exe.linux-x86_64-3.12/lib/PySide6/Qt/lib/
    second location --- build/exe.linux-x86_64-3.12/lib/PySide6/Qt/plugins/platforms/
    third location --- build/exe.linux-x86_64-3.12/lib/PySide6/Qt/plugins/xcbglintegrations/

It is 3*30 Mb instead of 30 Mb And there are many such files - like libQt6Core.so.6 (7Mb), libQt6Gui.so.6 (10 MB) and other.

I replace duplicate of some of this files to sym link - and all still work.

If i use simple app with QWebEngine it's even worse. The size of build folder about 1.6 Gb before cleaning.

I try this simple project on Win10 and bug not present there. The size is approximately the same as after cleaning in Ubuntu.

To Reproduce
cx_Freeze.zip.

Expected behavior
Remove not used Qt plugins, create sym links in build process?

Desktop:

  • Ubuntu Linux 20.04:
  • amd64:
  • cx_Freeze 7.1.1:
  • Python version [3.9 - 3.12]:

We're using cx_Freeze for building Embeetle IDE (see https://embeetle.com). Recently, we noticed also a blow up in size of PyQt6 on our Linux builds.

Hi @marcelotduarte ,
Do you have any news on this issue (the size blow up on Linux when building PyQt6) ?

Hi @kristofmulier
This issue is in the top 5 of my list. I want to resolve the issues with Qt before I can release a version with Python 3.13.

Great to hear that @marcelotduarte .
In the meantime, I use the following workaround to replace the duplicates with symlinks on Linux:

def find_duplicated_files(root_dir: str = '.',
                          ignore_filenames: Optional[List[str]] = None,
                          ) -> Dict[str, List[Tuple[str, int, str]]]:
    """
    Scan the directory structure starting from root_dir and identify duplicate files based on
    content.

    Args:
        root_dir (str): The root directory to start scanning from.
        ignore_filenames (Optional[List[str]]): A list of filenames to ignore during scanning.

    Returns:
        Dict[str, List[Tuple[str, int, str]]]: A dictionary where each key is a file hash and the
        value is a list of tuples containing (filename, size, full_path) for each duplicate file.
    """
    if ignore_filenames is None:
        ignore_filenames = ["py.typed", "__init__.pyc", "generator.pyc"]
    file_map: Dict[str, List[Tuple[str, int, str]]] = {}

    for dirpath, _, filenames in os.walk(root_dir):
        for filename in filenames:
            if filename in ignore_filenames:
                continue  # Skip files that should be ignored
            full_path: str = os.path.join(dirpath, filename)
            try:
                size: int = os.path.getsize(full_path)
            except OSError as e:
                print(f"ERROR: Cannot get size of '{full_path}': {e}")
                continue

            hash_value: Optional[str] = compute_file_hash(full_path)
            if hash_value is None:
                print(f"ERROR: Cannot hash file '{full_path}'")
                continue

            if hash_value in file_map:
                file_map[hash_value].append((filename, size, full_path))
            else:
                file_map[hash_value] = [(filename, size, full_path)]

    # Identify duplicated files (same content)
    duplicated_files: Dict[str, List[Tuple[str, int, str]]] = {
        hash_value: file_tuples for hash_value, file_tuples in file_map.items() if len(file_tuples) > 1
    }
    return duplicated_files

def replace_with_symlinks(duplicated_files: Dict[str, List[Tuple[str, int, str]]],
                          dry_run: bool = True,
                          ) -> None:
    """
    Replace duplicate files with symbolic links pointing to the original file.

    Args:
        duplicated_files (Dict[str, List[Tuple[str, int, str]]]): A dictionary of duplicates
                                                                  identified by their hash.
        dry_run (bool): If True, the function will only simulate the replacement without making
                        changes.

    Returns:
        None
    """
    total_size_saved: int = 0
    for hash_value, file_tuples in duplicated_files.items():
        # All files have the same content
        size: int = file_tuples[0][1]  # All files have same size
        size_mb: float = size / (1024 * 1024)
        # print(f"Processing duplicated files (Hash: {hash_value}, Size: {size_mb:.2f} MB)")
        # print("Files:")
        # for filename, size, full_path in file_tuples:
        #     print(f"    {full_path}")
        # Keep the first file, replace others with symlinks
        original_file: str = file_tuples[0][2]  # full_path of the first file
        duplicates: List[Tuple[str, int, str]] = file_tuples[1:]  # Rest of the files

        for filename, size, duplicate_path in duplicates:
            # Calculate relative path from duplicate to original
            relative_path: str = os.path.relpath(original_file, os.path.dirname(duplicate_path))

            # If not in dry-run mode, replace duplicate with symlink
            if not dry_run:
                try:
                    # Remove the duplicate file
                    os.remove(duplicate_path)
                    # Create a symlink pointing to the original file
                    os.symlink(relative_path, duplicate_path)
                    print(f"Replaced '{duplicate_path}' with symlink to '{relative_path}'")
                except OSError as e:
                    print(f"ERROR: Cannot process '{duplicate_path}': {e}")
            else:
                print(f"Would replace '{duplicate_path}' with symlink to '{relative_path}'")

            total_size_saved += size

        print()

    total_size_saved_mb: float = total_size_saved / (1024 * 1024)
    if dry_run:
        print(f"Total size that can be eliminated: {total_size_saved_mb:.2f} MB")
    else:
        print(f"Total size eliminated: {total_size_saved_mb:.2f} MB")

# RUN BOTH FUNCTIONS
duplicated_files = find_duplicated_files(
    root_dir=f'{output_folder}/lib/PyQt6',
)
replace_with_symlinks(duplicated_files, dry_run=False)

Can you test the PR #2578?
pip install git+https://github.com/marcelotduarte/cx_Freeze.git@refs/pull/2578/head

Hi @marcelotduarte ,
Our build flow goes through Docker. So I added this line in the requirements.txt file that Docker executes to get the right python environment:

# Required for building
# ---------------------
--extra-index-url https://marcelotduarte.github.io/packages/
# cx_Freeze == 7.1.1
# cx_Freeze == 7.2.1
git+https://github.com/marcelotduarte/cx_Freeze.git@refs/pull/2578/head

As you can see, I commented out the lines cx_Freeze == 7.1.1 and cx_Freeze == 7.2.1 and force instead the installation of your PR#2578 with the line git+https://github.com/marcelotduarte/cx_Freeze.git@refs/pull/2578/head. I doubt however that this worked properly. Here is a snippet from the Docker output when it attempts to get the python environment right:

Looking in indexes: https://pypi.org/simple, https://marcelotduarte.github.io/packages/
Collecting git+https://github.com/marcelotduarte/cx_Freeze.git@refs/pull/2578/head (from -r embeetle/requirements.txt (line 93))
  Cloning https://github.com/marcelotduarte/cx_Freeze.git (to revision refs/pull/2578/head) to /tmp/pip-req-build-lrxpfgr7
  Running command git clone --filter=blob:none --quiet https://github.com/marcelotduarte/cx_Freeze.git /tmp/pip-req-build-lrxpfgr7
  WARNING: Did not find branch or tag 'refs/pull/2578/head', assuming revision or ref.
  Running command git fetch -q https://github.com/marcelotduarte/cx_Freeze.git refs/pull/2578/head
  Running command git checkout -q 0c2ea0bac1d1e9891701b1288b6b2fe50b81bd39
  Resolved https://github.com/marcelotduarte/cx_Freeze.git to commit 0c2ea0bac1d1e9891701b1288b6b2fe50b81bd39
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done

What worries me is this line:

WARNING: Did not find branch or tag 'refs/pull/2578/head', assuming revision or ref.

Then the output continues (I leave out the irrelevant stuff):

Collecting setuptools<76,>=65.6.3 (from cx_Freeze==7.2.1->-r embeetle/requirements.txt (line 93))
  Using cached setuptools-75.1.0-py3-none-any.whl.metadata (6.9 kB)
Collecting filelock>=3.12.3 (from cx_Freeze==7.2.1->-r embeetle/requirements.txt (line 93))
  Downloading filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting patchelf>=0.14 (from cx_Freeze==7.2.1->-r embeetle/requirements.txt (line 93))
  Downloading patchelf-0.17.2.1-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.musllinux_1_1_x86_64.whl.metadata (3.3 kB)

and

Building wheels for collected packages: zmq, cx_Freeze, sgmllib3k
  Building wheel for zmq (pyproject.toml) ... done
  Created wheel for zmq: filename=zmq-0.0.0-py3-none-any.whl size=1265 sha256=6d3a94cec8e76139c380da773913110acd017efd5859919295991cdfd2580594
  Stored in directory: /root/.cache/pip/wheels/68/8e/d4/3ed4272b059c74f0b9ae3930e45f075f295805c87d69b850f0
  Building wheel for cx_Freeze (pyproject.toml) ... done
  Created wheel for cx_Freeze: filename=cx_Freeze-7.2.1-cp312-cp312-linux_x86_64.whl size=5396342 sha256=9dc6e08ac01ba38a43ebac084f60b4c995c5a91d9398d24c50e8ca7fecdc3098
  Stored in directory: /tmp/pip-ephem-wheel-cache-vet_f9i_/wheels/2e/60/42/2b33ff2d34d3d5d9e5ed55ab5b9ad97af8eddeefefc8e326bb
  Building wheel for sgmllib3k (pyproject.toml) ... done
  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6047 sha256=1c9b2f65f8d0a487285df4fd3af40947c9be053b5d47a2a9f24bd628b87f1182
  Stored in directory: /root/.cache/pip/wheels/03/f5/1a/23761066dac1d0e8e683e5fdb27e12de53209d05a4a37e6246
Successfully built zmq cx_Freeze sgmllib3k
Installing collected packages: wcwidth, sortedcontainers, sgmllib3k, pyserial, PyQt6-Qt6, pyelftools, ptyprocess, patchelf, aenum, xmltodict, websocket-client, watchdog, wasmtime, urllib3, typing-extensions, six, setuptools, regex, pyzmq, PyYAML, pyte, PyQt6-sip, pygments, packaging, markdown, lowbar, filelock, feedparser, Cython, colorama, chardet, certifi, attrs, zmq, python-dateutil, PyQt6, pydantic, hypothesis, cx_Freeze, astyle-py, PyQt6-QScintilla, pieces_os_client
Successfully installed Cython-3.0.11 PyQt6-6.7.1 PyQt6-QScintilla-2.14.1 PyQt6-Qt6-6.7.2 PyQt6-sip-13.8.0 PyYAML-6.0.2 aenum-3.1.15 astyle-py-1.0.5 attrs-24.2.0 certifi-2024.8.30 chardet-5.2.0 colorama-0.4.6 cx_Freeze-7.2.1 feedparser-6.0.11 filelock-3.16.1 hypothesis-6.112.1 lowbar-1.5.3 markdown-3.7 packaging-24.1 patchelf-0.17.2.1 pieces_os_client-3.2.1 ptyprocess-0.7.0 pydantic-1.10.18 pyelftools-0.31 pygments-2.18.0 pyserial-3.5 pyte-0.8.2 python-dateutil-2.9.0.post0 pyzmq-26.2.0 regex-2024.9.11 setuptools-75.1.0 sgmllib3k-1.0.0 six-1.16.0 sortedcontainers-2.4.0 typing-extensions-4.12.2 urllib3-2.2.3 wasmtime-12.0.0 watchdog-5.0.2 wcwidth-0.2.13 websocket-client-1.8.0 xmltodict-0.13.0 zmq-0.0.0

So far the setup of the python environment in the Docker image. Now comes the actual build of the python code with cx_Freeze, but that fails:

Traceback (most recent call last):
  File "/data/embeetle/beetle_updater/build.py", line 187, in <module>
    build_updater_tool(
  File "/data/embeetle/beetle_updater/build.py", line 62, in build_updater_tool
    cx_Freeze.Executable(
  File "/root/env/lib/python3.12/site-packages/cx_Freeze/executable.py", line 47, in __init__
    self.base = base
    ^^^^^^^^^
  File "/root/env/lib/python3.12/site-packages/cx_Freeze/executable.py", line 86, in base
    raise OptionError(msg)
distutils.errors.DistutilsOptionError: no base named 'console' ('console-cpython-312-x86_64-linux-gnu')
ERROR: Freezing updater tool failed. Executable not found.

This docker has gcc?
The compiled version is here: https://github.com/marcelotduarte/cx_Freeze/actions/runs/10970558992/artifacts/1961237363
Download the zip and extract the wheel for python 3.12

I checked with wheel version from here https://github.com/marcelotduarte/cx_Freeze/actions/runs/10970558992/artifacts/1961237363

It is look like all work in test simple projects and my work real project. Build size decreese from 1.6 Gb to 400 Mb.

git+https version i can't install. Errors like kristofmulier was write. In my case without docker. May be i losed some needed dependences in system...

Hi @marcelotduarte ,
I ran a build in our Linux-Docker with the python-wheel you prepared for me (see https://github.com/marcelotduarte/cx_Freeze/actions/runs/10970558992/artifacts/1961237363 ). Yes, that worked great! As far as I can tell, the size is really okay now.

Great job!

By the way - when do you plan to bring out the next version of cx_Freeze, with the fix for this issue as well as the svg-file-issue? I suppose that would then be version 7.2.2 - right? I'm asking because I'd like to make an internal planning for our next public build.

Ran it on the latest Embeetle builds. Works great!
Thanks @marcelotduarte !