Support for data_files
delphyne opened this issue ยท 38 comments
- I have searched the issues of this repo and believe that this is not a duplicate.
- I have searched the documentation and believe that my question is not covered.
Feature Request
Poetry does not current support the setup(data_files:[]) element which allows you to include datafiles which live outside of the package files area. This functionality is generally used for shipping non-code files which might be necessary for your library to run, or for other libraries to build. Examples include protobuf .proto files, avro schemas, thrift idl, etc.
I used data_files to ship systemd unit file. This is a very important feature !
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@sdispater I guess this would wait for post 1.0, right ?
This is critical for a (basic) gui app
Do I understand correctly, that data_files that live next to the package modules are supported?
So, this layout should work and config_data.csv will be packaged?
pkgname/
pyproject.toml
src/
pckname/
__init__.py
data_files/
config_data.csv
Every time I jump head first into a new tool, I smash my face into the bottom of the pool.
This should now be covered by https://python-poetry.org/docs/pyproject/#include-and-exclude. If this is not the case, please feel free to comment here or open a new issue with the specific scenario not covered.
that doesn't let you specify where they should go? how are users supposed to install a .desktop for DE integration?
There are at least two use cases:
- https://docs.python.org/2/distutils/setupscript.html#installing-package-data
- https://docs.python.org/2/distutils/setupscript.html#installing-additional-files
AIUI include/exclude mechanism do not match either, they just add it to the package
Now, substantially, if my package is going to be installed in
/some/path/lib/python3.6/site-packages/
then those files are going to be installed directly into such directory
Those two use cases specify something as much required to be able to move from setuptools, as not implemented yet in poetry.
note: package_data can be implemented with include, but does not provide a lot of flexibility. we have to hardcode lots of path, pacakge_data is package oriented (as make sense to be, managing packaging). also the empty package '' use case is extremely important:
{'': ['assets/*']}
is extremely expressive for whom has lots of files in lots of packages (which can be added and removed) and with include would need to explicitely list them all.
@kalfa, as you later realized, include/exclude matches exactly the first use case you linked, package data. Can you provide a concrete example of package data that is easy to do with setuptools but difficult with Poetry?
Package data is less expressive with include/exclude and more difficult to read.
Overall it is possible to achieve most if not all use cases.
Setup tools approach is more compact and readable
- Install the directory asset in each package.
- install the directory foo in package X
'':'assets/*',
'X':'foo/*'
With poetry I have to specify a list of more obscure patterns. But for simple enough projects, is good enough. As you said, i understood later the potentiality.
What is missing is the other use case, which this ticket is about, and has been closed and IMHO should be reopened
I'm porting setup.py files to pyproject.toml and trying to build the same wheel. Happy to find out I'm wrong and it's possible
"data_files" are delivered relative to sys.prefix
, whereas "package_data" is delivered to site-packages. I don't think it's possible to deliver files relative to sys.prefix
using Poetry's include/exclude options.
Another use-case is the distribution of man pages with the package.
I tried to move a unix console app following FHS from setuptools to poetry but was stuck in this issue and looks like will have to rollback :(
@kalfa in the post above you wrote that
package_data can be implemented with include, but does not provide a lot of flexibility. we have to hardcode lots of path
Could you please provide an example of how I can achieve this with poetry:
data_files=[('/etc/myapp/', ['myapp.conf'])]
@kalfa in the post above you wrote that
package_data can be implemented with include
Could you please provide an example of how I can achieve this with poetry:
data_files=[('/etc/myapp/', ['myapp.conf'])]
I don't think you can yet (unless in the meantime I wrote my comments it's been implemented and I'm unaware of it).
Only package_data has a way in poetry.
What you mentioned is the same use case of desktop files & co.
From my point of view (and many others) using data_files
(the one from setuptools) is a bad practice. And I would venture that it is why it is not supported in poetry.
The idea that pip-installing a project could result in files being written to random locations on the local file system is discomforting. Things like: data_files=[('/etc/myapp/', ['myapp.conf'])]
get a "no" from me. For one, it would mean pip-installing with sudo
, which is also a "no" (there are way too many issues coming with that).
It is true, that there is a need for such things, in particular for applications. But from what I understood Python's packaging ecosystem was initially built with libraries in mind, applications were (and still very much are) some kind of second class citizens in a sense. So packaging applications in order to distribute them on PyPI is still very awkward. There are many other issues showing this divide between libraries and applications all around the Python packaging ecosystem in general, poetry included (but not poetry's fault in any sense, as far as I have seen).
My usual recommendation when something like data_files
is needed is to go beyond the standard/common Python packaging techniques and reach for the packaging techniques specific to the operating systems. So for example, for Linux I would recommend looking into packaging your applications with apt/.deb
, yum/.rpm
, pacman, appimage, snap, etc. Give pyinstaller, or beewares's briefcase or other similar tools a look. Those would probably give you a much better experience for such things.
The setuptools package_data
on the other hand, is perfectly fine and encouraged. It only results in files written in the venv/lib/site-packages/mylibrary
directory of your own package for the environment. So for poetry, as it was already mentioned, use include
and exclude
. More often than not, those are sufficient, no need to write files to random places on the file system. Also remember to use importlib.resources
to read those files, never rely on paths relative to __file__
.
I will also add that for things such as configuration files, user data files, cache files, etc. you should have a look at platformdirs
.
So, in short:
If you need data_files
, think twice. If you really need data_files
, I would recommend you to rely on something more than just the common Python packaging tools. Go for pyinstaller, or briefcase, etc. or for more heavy duty tools (apt, yum, pacman, etc.). Because you want OS-specific or Linux distro-specific things anyway. More generally if you want to distribute applications, you might want to look beyond distributing as sdist and wheels, those are not really made for applications.
To add onto @sinoroc's excellent explanation, remember that Python is a cross-platform language. People install Python software on Windows, and if you publish an application on PyPI, Windows users might expect that it will work on their system. If you install platform-specific files (e.g. /etc/whatever
), then you might need a platform-specific installer.
@sinoroc From what I see in this issue, it looks like the reason data_files
is not supported in Poetry currently is that the maintainers do not see the use case for it. It is certainly true that use of data_files
should be minimized (libraries almost never need it), but applications in many cases have no other option to bundle assets properly.
The idea that pip-installing a project could result in files being written to random locations on the local file system is discomforting.
Even without data_files
, this is the case. Arbitrary code is executed whenever you pip install
a package; pip install
ing a package means you trust that package.
Additionally, data_files
does not require that you specify absolute paths for your files to be installed into (in fact, it's discouraged). Relative paths work (e.g. ('share/applications', 'xyz.desktop')
), and the files will be installed relative to either sys.prefix
or site.USER_BASE
.
Your recommendation for using tools like pyinstaller, briefcase, Debian packages, etc. isn't really possible for application developers in a lot of cases. If, for example, an application wanted to support only Linux, there are still a lot of different kinds of package formats that the application developer would have to support. For that reason, distribution-specific package formats are usually created and maintained specifically for those distributions by someone on behalf of the distribution, rather than the maintainer of the application. Also, many of these formats take advantage of using the application's setuptools setup to install data files (see for example pybuild).
include
is not a replacement for data_files
in many cases, as other users have mentioned here (application desktop files, systemd unit files, man pages, etc).
@thejohnfreeman To address your concern of Linux-only applications on PyPI, it is not a requirement that Poetry packages are published to PyPI. A lot of applications won't be. Also, PyPI has classifiers to mark applications as supporting only Linux. Users should not be blindly installing applications from PyPI --- that is a recipe for disaster.
@thomassross I totally understand your point of view. But my point still stands: I do not believe data_files
is a good practice for the common use cases. And as far as I understood, one of the big drivers for the development of poetry is to enforce good practices.
There are obviously very legitimate use cases where data_files
are helpful and a good solution. For example if the project is only used in controlled environment for private usage, then I have nothing against using data_files
.
So I would side on not adding support for data_files
in poetry, and I would absolutely encourage a plugin that adds this feature (plugin system is scheduled for v1.2).
While it may not be terribly common, it is still a necessary piece of functionality for many applications if they want to fully take advantage of Poetry. I personally would like to see it in Poetry core (with a warning in the documentation recommending include
where it's possible to use it, if required).
In any case, it would be great if we could get a response from the project maintainers on how they feel about implementing this functionality (@abn?).
This is a feature that is preventing me from adopting Poetry in some of my own projects.
@sinoroc
(I moved the discussion to here.)
Could you give or update the example in the include and exclude section for what the relative path is based? Is it based on where pyproject.toml locates or the package folder?
For example,
dummy_folder/
pyproject.toml
CHANGE.log
my_package/
__init__.py
my_data.csv
The pyproject.toml
is for my_package/
and I am not sure what should I specify in include = []
in pyproject.toml
, is it my_data.csv
or my_package/my_data.csv
?
If it is the latter, would it fail for the user to simply specify CHANGE.log
because only things in my_package/
will be installed to site-packages
?
@hyliu1989 I am probably not the best placed to answer this, but I will try to give it a shot in the other thread.
Another piece of information on the topic:
data_files
Warning:
data_files
is deprecated. It does not work with wheels, so it should be avoided.A list of strings specifying the data files to install.
-- https://setuptools.readthedocs.io/en/latest/references/keywords.html?highlight=data_files
Also maybe related: pypa/wheel#92
I am interested in data_files
support for exactly the same reason as @bersace gave above (#890 (comment)):
I used data_files to ship systemd unit file. This is a very important feature !
Neither package_data
nor include
/exclude
work for this case.
Was this closed because there has been no work on it? Or was it closed because a PR to add this feature would be rejected?
It seems that data_file
support is pretty much needed for packaging anything that works with Jupyter, see here and here.
flit
has added support for a simplified and constrained version of data_files
, which might also make it to replace the deprecated data_file
functionality in setuptools
. Thanks to this new feature, there is also some work on enabling flit
for packaging with Jupyter extensions. Having the same possibility for poetry
would be very nice.
@N-Coder , can you open another bug explicitly about jupyther and mentioning this bug please?
This bug is now closed, but I think it is still worth it underlining data_file
-equivalent feature is still missing (unless it has been added meanwhile, which would be great and a new ticket would still a win if we learned that)
I guess #4013 describes the issue from the Jupyter side or would you want a feature request for replicating the external-data
functionality from flit
?
Hello! I'm trying to assess how this feature would be used generally.
Hatchling supports a shared-data
option for wheels. Would that satisfy everyone's use case here?
In my case, I need to install a manpage and a zsh completion file, so for me, yes.
Hatchling supports a shared-data option for wheels. Would that satisfy everyone's use case here?
@ofek the link you provided results in a 404 error. I think the following links to what you intended: https://hatch.pypa.io/latest/plugins/builder/#options
Thanks! Hatch was adopted by the PyPA so the docs site was moved.
I created a poetry plugin that adds support for data_files in pyproject.toml: https://github.com/spoorn/poeblix, https://pypi.org/project/poeblix/
@spoorn did you get a jupyter plugin packaged with poetry working with your plugin. Do you have an example?
@spoorn, it's an incredible amount of work you've done! I think you should ask maintainers to mention this plugin in docs somehow.
@N-Coder I got this working with nbconvert template files. Example: https://github.com/spoorn/poeblix/blob/main/test/positive_cases/happy_case_example/pyproject.toml
@droserasprout Thanks! Up to the poetry maintainers
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.