iterative/dvc

Failed to create a Pyinstaller app that includes dvc

jorgegarcia-ea opened this issue ยท 8 comments

Bug Report

Failed to create a Pyinstaller app that includes dvc

Description

We are trying to make a pyinstaller app that includes dvc and after generating the spec we get this error:

PyInstaller.exceptions.ImportErrorWhenRunningHook: Failed to import module __PyInstaller_hooks_0_dvc required by hook for module D:\test_venv\Lib\site-packages\dvc\__pyinstaller\hook-dvc.py. Please check whether module __PyInstaller_hooks_0_dvc actually exists and whether the hook is compatible with your version of D:\test_venv\Lib\site-packages\dvc\__pyinstaller\hook-dvc.py: You might want to read more about hooks in the manual and provide a pull-request to improve PyInstaller.

Reproduce

  • Create a python venv with pyinstaller and dvc in the requirements
  • Add some source files that use dvc
  • Generate a pyinstaller spec
  • Run pyinstaller on the spec

Expected

Can create a pyinstaller app without errors

Environment information

Windows 10, python 3.11.6, pyinstaller 6.4.0, dvc 3.47.0

Hey @jorgegarcia-ea , we are a small team and won't be able allocate (unless someone already has good knowledge of what is happening there) time into research on such a specific setup - custom PyInstaller. Depending on the context you could probably reach out and engage with us https://dvc.org/support or try to dig into this more and keep sharing the results here. The first step should be probably an easily reproducible bundle that anyone could run + more details on you think it is happening.

@jorgegarcia-ea Any particular reason you are trying to build it yourself? We provide a windows package that has a pyinstaller-built binary inside, e.g. https://dvc.org/download/win/dvc-3.47.0 . Here's the whole process of building that https://github.com/iterative/dvc-exe

Hi @shcheklein and @efiop, many thanks for your quick responses. Here's additional details of the issue we are experiencing from my colleague, hope they are helpful:

We want to create an executable of our application. The goal is to reduce the size by avoiding packaging some data and instead using DVC. The code will download the data stored in the remote storage the first time the app starts.
We have been able to do this locally using DVCFileSystem and get_file(). However, when we try to build the exe with pyinstaller we get the error above.

Please find a minimum reproduction example attached. Let me know if you have any questions or need additional details.

dvc_pyinstaller_test.zip

Oh, so you are building a standalone app where dvc is a dependency, I see now. Thanks for clarifying.

So which commands are you running? You've shared a part of the error and source files, but not full error log or particular pyinstaller commands you were running to generate the spec and build the app.

Also looking at main.py, I see that you make some very strong assumptions about dvc file availability by using cwd with dvcfs, and I don't think that will work because you need to tell pyinstaller what files you want to include in the resulting app. So probably you'll run into more issues, but they are not really dvc specific but more about using pyinstaller itself and we'll not able able to help you there.

Hello, thanks for your help, I'm collaborating with @jorgegarcia-ea on this project.
We would like to know if it is even possible to run dvc commands inside a standalone app created with pyinstaller, since the error we get happens when building the executable.

We are planning on adding the necessary files to the bundle. For now, let's assume we will receive an error at runtime due to lack of files.

In the minimal example just running pyinstaller main.py will return the above error. The whole stacktrace can be found attached (notice I'm using conda instead of venv)
dvc_pyinstaller_test_stacktrace.txt

Let us know if you need more information to reproduce the error.
Thank you!

@MonicaVillanueva Thanks for the log! The error there is actually:

importlib.metadata.PackageNotFoundError: No package metadata was found for adlfs

which probably means that you don't have adlfs installed. I suppose you've installed dvc through pip, so you probably need to pip install 'dvc[all]' to install all optional dependencies. That's just the way we build pyinstaller apps ourselves, so we expect all extra dependencies to be installed. There are ways to avoid that, but that will require modifying the hook.

Hey @efiop, sorry for the delay
Thanks for the suggestion, it works!

I'm working now towards making the example above work (e.g. use dvc inside the executable).
I will update here when I get it working. It might be useful for other people and you could add it to the documentation if you find it valuable.

Thanks for your help

For future reference:

There are a couple of tricky things:

  1. As kindly suggested above, you need to install all dvc dependencies with pip install dvc[all]
  2. It is necessary to move de .dvc folder inside the executable bundled dir _internal. If you don't have no_scm = true you will need to add it to .dvc/config. Otherwise, when you run the exe it will complain that it is not a git repository
  3. I realized using fs.find that the dvc file needs a slash in front of the path or fs.get_file won't work

The rest is standard pyinstaller:

  1. You need to add your dvc file using pyinstaller main.py --add-data "test_file.txt.dvc;."
  2. You need to set the DVCFileSystem using the absolute path to the bundle folder stored in sys._MEIPASS e.g. DVCFileSystem(sys._MEIPASS)

Find the updated project here: dvc_pyinstaller_test_updated.zip