python-poetry/poetry

Support for data_files

delphyne opened this issue ยท 38 comments

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Feature Request

Poetry does not current support the setup(data_files:[]) element which allows you to include datafiles which live outside of the package files area. This functionality is generally used for shipping non-code files which might be necessary for your library to run, or for other libraries to build. Examples include protobuf .proto files, avro schemas, thrift idl, etc.

I used data_files to ship systemd unit file. This is a very important feature !

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@sdispater I guess this would wait for post 1.0, right ?

This is critical for a (basic) gui app

Do I understand correctly, that data_files that live next to the package modules are supported?

So, this layout should work and config_data.csv will be packaged?

pkgname/
  pyproject.toml
  src/
    pckname/
      __init__.py
      data_files/
        config_data.csv

Every time I jump head first into a new tool, I smash my face into the bottom of the pool.

abn commented

This should now be covered by https://python-poetry.org/docs/pyproject/#include-and-exclude. If this is not the case, please feel free to comment here or open a new issue with the specific scenario not covered.

that doesn't let you specify where they should go? how are users supposed to install a .desktop for DE integration?

kalfa commented

There are at least two use cases:

  1. https://docs.python.org/2/distutils/setupscript.html#installing-package-data
  2. https://docs.python.org/2/distutils/setupscript.html#installing-additional-files

AIUI include/exclude mechanism do not match either, they just add it to the package

Now, substantially, if my package is going to be installed in
/some/path/lib/python3.6/site-packages/ then those files are going to be installed directly into such directory

Those two use cases specify something as much required to be able to move from setuptools, as not implemented yet in poetry.

note: package_data can be implemented with include, but does not provide a lot of flexibility. we have to hardcode lots of path, pacakge_data is package oriented (as make sense to be, managing packaging). also the empty package '' use case is extremely important:
{'': ['assets/*']} is extremely expressive for whom has lots of files in lots of packages (which can be added and removed) and with include would need to explicitely list them all.

@kalfa, as you later realized, include/exclude matches exactly the first use case you linked, package data. Can you provide a concrete example of package data that is easy to do with setuptools but difficult with Poetry?

kalfa commented

Package data is less expressive with include/exclude and more difficult to read.
Overall it is possible to achieve most if not all use cases.

Setup tools approach is more compact and readable

  • Install the directory asset in each package.
  • install the directory foo in package X
'':'assets/*',
'X':'foo/*'

With poetry I have to specify a list of more obscure patterns. But for simple enough projects, is good enough. As you said, i understood later the potentiality.

What is missing is the other use case, which this ticket is about, and has been closed and IMHO should be reopened

I'm porting setup.py files to pyproject.toml and trying to build the same wheel. Happy to find out I'm wrong and it's possible

"data_files" are delivered relative to sys.prefix, whereas "package_data" is delivered to site-packages. I don't think it's possible to deliver files relative to sys.prefix using Poetry's include/exclude options.

Another use-case is the distribution of man pages with the package.

I tried to move a unix console app following FHS from setuptools to poetry but was stuck in this issue and looks like will have to rollback :(

@kalfa in the post above you wrote that

package_data can be implemented with include, but does not provide a lot of flexibility. we have to hardcode lots of path

Could you please provide an example of how I can achieve this with poetry:

data_files=[('/etc/myapp/', ['myapp.conf'])]

kalfa commented

@Ezhvsalate

@kalfa in the post above you wrote that

package_data can be implemented with include

Could you please provide an example of how I can achieve this with poetry:

data_files=[('/etc/myapp/', ['myapp.conf'])]

I don't think you can yet (unless in the meantime I wrote my comments it's been implemented and I'm unaware of it).

Only package_data has a way in poetry.

What you mentioned is the same use case of desktop files & co.

@kalfa thank you, got it.

@abn Is there any chance for the feature to be implemented? Maybe there is some way to reopen it? Found also a pull request #901 with implementation but it's also closed.

From my point of view (and many others) using data_files (the one from setuptools) is a bad practice. And I would venture that it is why it is not supported in poetry.

The idea that pip-installing a project could result in files being written to random locations on the local file system is discomforting. Things like: data_files=[('/etc/myapp/', ['myapp.conf'])] get a "no" from me. For one, it would mean pip-installing with sudo, which is also a "no" (there are way too many issues coming with that).

It is true, that there is a need for such things, in particular for applications. But from what I understood Python's packaging ecosystem was initially built with libraries in mind, applications were (and still very much are) some kind of second class citizens in a sense. So packaging applications in order to distribute them on PyPI is still very awkward. There are many other issues showing this divide between libraries and applications all around the Python packaging ecosystem in general, poetry included (but not poetry's fault in any sense, as far as I have seen).

My usual recommendation when something like data_files is needed is to go beyond the standard/common Python packaging techniques and reach for the packaging techniques specific to the operating systems. So for example, for Linux I would recommend looking into packaging your applications with apt/.deb, yum/.rpm, pacman, appimage, snap, etc. Give pyinstaller, or beewares's briefcase or other similar tools a look. Those would probably give you a much better experience for such things.

The setuptools package_data on the other hand, is perfectly fine and encouraged. It only results in files written in the venv/lib/site-packages/mylibrary directory of your own package for the environment. So for poetry, as it was already mentioned, use include and exclude. More often than not, those are sufficient, no need to write files to random places on the file system. Also remember to use importlib.resources to read those files, never rely on paths relative to __file__.

I will also add that for things such as configuration files, user data files, cache files, etc. you should have a look at platformdirs.

So, in short:
If you need data_files, think twice. If you really need data_files, I would recommend you to rely on something more than just the common Python packaging tools. Go for pyinstaller, or briefcase, etc. or for more heavy duty tools (apt, yum, pacman, etc.). Because you want OS-specific or Linux distro-specific things anyway. More generally if you want to distribute applications, you might want to look beyond distributing as sdist and wheels, those are not really made for applications.

To add onto @sinoroc's excellent explanation, remember that Python is a cross-platform language. People install Python software on Windows, and if you publish an application on PyPI, Windows users might expect that it will work on their system. If you install platform-specific files (e.g. /etc/whatever), then you might need a platform-specific installer.

@sinoroc From what I see in this issue, it looks like the reason data_files is not supported in Poetry currently is that the maintainers do not see the use case for it. It is certainly true that use of data_files should be minimized (libraries almost never need it), but applications in many cases have no other option to bundle assets properly.

The idea that pip-installing a project could result in files being written to random locations on the local file system is discomforting.

Even without data_files, this is the case. Arbitrary code is executed whenever you pip install a package; pip installing a package means you trust that package.

Additionally, data_files does not require that you specify absolute paths for your files to be installed into (in fact, it's discouraged). Relative paths work (e.g. ('share/applications', 'xyz.desktop')), and the files will be installed relative to either sys.prefix or site.USER_BASE.

Your recommendation for using tools like pyinstaller, briefcase, Debian packages, etc. isn't really possible for application developers in a lot of cases. If, for example, an application wanted to support only Linux, there are still a lot of different kinds of package formats that the application developer would have to support. For that reason, distribution-specific package formats are usually created and maintained specifically for those distributions by someone on behalf of the distribution, rather than the maintainer of the application. Also, many of these formats take advantage of using the application's setuptools setup to install data files (see for example pybuild).

include is not a replacement for data_files in many cases, as other users have mentioned here (application desktop files, systemd unit files, man pages, etc).

@thejohnfreeman To address your concern of Linux-only applications on PyPI, it is not a requirement that Poetry packages are published to PyPI. A lot of applications won't be. Also, PyPI has classifiers to mark applications as supporting only Linux. Users should not be blindly installing applications from PyPI --- that is a recipe for disaster.

@thomassross I totally understand your point of view. But my point still stands: I do not believe data_files is a good practice for the common use cases. And as far as I understood, one of the big drivers for the development of poetry is to enforce good practices.

There are obviously very legitimate use cases where data_files are helpful and a good solution. For example if the project is only used in controlled environment for private usage, then I have nothing against using data_files.

So I would side on not adding support for data_files in poetry, and I would absolutely encourage a plugin that adds this feature (plugin system is scheduled for v1.2).

While it may not be terribly common, it is still a necessary piece of functionality for many applications if they want to fully take advantage of Poetry. I personally would like to see it in Poetry core (with a warning in the documentation recommending include where it's possible to use it, if required).

In any case, it would be great if we could get a response from the project maintainers on how they feel about implementing this functionality (@abn?).

This is a feature that is preventing me from adopting Poetry in some of my own projects.

@sinoroc
(I moved the discussion to here.)

Could you give or update the example in the include and exclude section for what the relative path is based? Is it based on where pyproject.toml locates or the package folder?

For example,

dummy_folder/
    pyproject.toml
    CHANGE.log
    my_package/
        __init__.py
        my_data.csv

The pyproject.toml is for my_package/ and I am not sure what should I specify in include = [] in pyproject.toml, is it my_data.csv or my_package/my_data.csv?

If it is the latter, would it fail for the user to simply specify CHANGE.log because only things in my_package/ will be installed to site-packages?

@hyliu1989 I am probably not the best placed to answer this, but I will try to give it a shot in the other thread.

Another piece of information on the topic:

data_files

Warning: data_files is deprecated. It does not work with wheels, so it should be avoided.

A list of strings specifying the data files to install.

-- https://setuptools.readthedocs.io/en/latest/references/keywords.html?highlight=data_files

Also maybe related: pypa/wheel#92

I am interested in data_files support for exactly the same reason as @bersace gave above (#890 (comment)):

I used data_files to ship systemd unit file. This is a very important feature !

Neither package_data nor include/exclude work for this case.

Was this closed because there has been no work on it? Or was it closed because a PR to add this feature would be rejected?

It seems that data_file support is pretty much needed for packaging anything that works with Jupyter, see here and here.
flit has added support for a simplified and constrained version of data_files, which might also make it to replace the deprecated data_file functionality in setuptools. Thanks to this new feature, there is also some work on enabling flit for packaging with Jupyter extensions. Having the same possibility for poetry would be very nice.

kalfa commented

@N-Coder , can you open another bug explicitly about jupyther and mentioning this bug please?

This bug is now closed, but I think it is still worth it underlining data_file-equivalent feature is still missing (unless it has been added meanwhile, which would be great and a new ticket would still a win if we learned that)

I guess #4013 describes the issue from the Jupyter side or would you want a feature request for replicating the external-data functionality from flit?

ofek commented

Hello! I'm trying to assess how this feature would be used generally.

Hatchling supports a shared-data option for wheels. Would that satisfy everyone's use case here?

In my case, I need to install a manpage and a zsh completion file, so for me, yes.

Hatchling supports a shared-data option for wheels. Would that satisfy everyone's use case here?

@ofek the link you provided results in a 404 error. I think the following links to what you intended: https://hatch.pypa.io/latest/plugins/builder/#options

ofek commented

Thanks! Hatch was adopted by the PyPA so the docs site was moved.

I created a poetry plugin that adds support for data_files in pyproject.toml: https://github.com/spoorn/poeblix, https://pypi.org/project/poeblix/

@spoorn did you get a jupyter plugin packaged with poetry working with your plugin. Do you have an example?

@spoorn, it's an incredible amount of work you've done! I think you should ask maintainers to mention this plugin in docs somehow.

@N-Coder I got this working with nbconvert template files. Example: https://github.com/spoorn/poeblix/blob/main/test/positive_cases/happy_case_example/pyproject.toml

@droserasprout Thanks! Up to the poetry maintainers

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.