sagemath/sage

Support package_data-like of non-Python resource files in Python packages

embray opened this issue · 22 comments

This enhances sage_setup.find.find_extra_files to better support non-Python resources found in Python package sources. Previously it could include extra files found under build/cythonized, but now it can also include arbitrary files (including support for fnmatch patterns) that should be installed alongside a Python package. For example,

find_extra_files('sage.foo', '.', special_filenames=['*.html'])

will find any .html files under the ./sage/foo/ package.

This will be useful for #21732 and possibly other tickets.

This also adds a package_data variable in module_list.py that can list all non-Python resources on a per-package basis (using the same format as the standard package_data option to setup.py. This includes handling of package data in a way that's compatible with sage_setup.clean.clean_install_dir.

CC: @mkoeppe

Component: build

Author: Erik Bray

Branch/Commit: u/embray/build/package-data @ 5d30652

Issue created by migration from https://trac.sagemath.org/ticket/22655

Description changed:

--- 
+++ 
@@ -1,6 +1,6 @@
 This enhances `sage_setup.find.find_extra_files` to better support non-Python resources found in Python package sources.  Previously it could include extra files found under `build/cythonized`, but now it can also include arbitrary files (including support for `fnmatch` patterns) that should be installed alongside a Python package.  For example,
 
-```
+```python
 find_extra_files('sage.foo', '.', special_filenames=['*.html'])
 ```
 
comment:1

Hopefully this makes sense. Let me know if there's any more I can clarify. Note that, unlike my original attempt at this in #21732, I avoided moving module_list.py for now.

comment:3

First of all, there is an easy conflict with #22662.

The function find_extra_files was obviously meant to find extra Cython files since it looks for .pxd and .pxi files.

It makes sense to generalise this, but then it should no longer handle .pxd and .pxi files by default: those should be passed as *.pxd and *.pxi arguments by sage_build_ext.copy_extra_files.

comment:4

I think it's a bit silly to duplicate copy_extra_files. There are two reasons for this:

  1. The build_py command already contains lots of code to deal with "data files". It might be better to hook into this, for example by overriding build_py.get_data_files(). The only reason that I didn't do this in #21600 is because build_ext is run after build_py.

  2. If you really want to copy the files manually, I don't see much reason to treat the Cython data files differently from the package_data data files. The current code makes a distinction between cmd_build_ext.cythonized_files and cmd_build_py.extra_files. There is no reason to do that.

See also #21682 comment:34

comment:5

There is also an easy conflict with #22106.

comment:6

build_ext is run after build_py

Only if you run ./setup.py build. The two commands can be run independently as well and serve different purposes. In particular, it makes sense to make a distinction here because one case is for installing files that are found in the Python source tree, while the other is for installing files found under build/cythonized (which normally would actually be the Python source tree but we build this separate hierarchy--I never liked that, but whatever).

comment:7

Replying to @embray:

build_ext is run after build_py

Only if you run ./setup.py build.

What do you mean? If I run pip install ... or ./setup.py install, it would still run build_py before build_ext, right? Or am I missing something?

comment:8

setup.py install calls setup.py build, yes. All of these commands can be run individually, and should work individually, in principle. It's just that some of them of them (namely "install" and "build") work primarily by running a number of simpler sub-commands in some order.

comment:9

Replying to @embray:

setup.py install calls setup.py build, yes. All of these commands can be run individually, and should work individually, in principle.

For what value of "work"? Do you mean "work" as in: doesn't raise an exception? Or "work" as in: actually does something meaningful?

It makes little sense to run just build_ext without anything else. In the real world, build scripts have a natural order and dependencies. You cannot cherry-pick pieces of a build system and expect those pieces to be meaningful individually. Especially if you want something like cython/cython#1514 (which I think is a good idea) you cannot run build_ext individually.

Splitting the installation of data files in two, just because it adheres to some theoretical principle that nobody cares about, looks like a bad idea to me.

comment:10

Together with #21682, I would like to move to the following build steps:

  1. Run build_cython to run Cython, which generates .c and .h files.

  2. Run build_py which "builds" Python files and which copies all package data, from Cython and non-Cython. Preferably, copying this package data can be done by using some existing machinery from build_py.

  3. Run build_ext as usual.

Erik, what's your opinion on this?

comment:11

It makes little sense to run just build_ext without anything else.

I do that almost every day on 'normal' projects. ./setup.py build_ext --inplace is the best thing to doing in-source development of a project that uses extension modules.

In any case, by overthinking it you're actually making it more complicated and less reliable. It works fine as is.

comment:12

Replying to @embray:

It works fine as is.

...at the cost of extra complexity. I am a strong believer in the https://en.wikipedia.org/wiki/KISS_principle

comment:13

You don't need to link me to a Wikipedia article on "KISS".

In fact complexity is sometimes subjective. You're upset because you see what looks like a little bit of code duplication (actually, I don't see it as duplication; it's doing two somewhat different things that nonetheless look similar due to using the same API), but without it in fact you're relying on a fragile, poorly documented/specified state machine to do things in the right order or else everything breaks. That to me is complexity.

comment:14

(I do agree about rebasing on / integrating with #21682 except that's been stalled for months for no apparent reason)

comment:15

Alright, I rebased #21682, so I'll at least see about integrating this ticket with it properly. If there's any opportunity I can find to reduce code duplication I'll take it, but I do think in this case files generated from Cython should be handled separately from static files.

comment:16

does not apply

comment:17

Erik: I'm confused how you see the bigger picture with this ticket and #21682. In your ideal world, how would the build process work? In other words: what is your alternative to [comment:10]?

comment:18

TODO: discuss next week if possible.

comment:19

Replying to @jdemeyer:

Together with #21682, I would like to move to the following build steps:

  1. Run build_cython to run Cython, which generates .c and .h files.

  2. Run build_py which "builds" Python files and which copies all package data, from Cython and non-Cython. Preferably, copying this package data can be done by using some existing machinery from build_py.

  3. Run build_ext as usual.

Erik, what's your opinion on this?

I think this makes sense, actually. The only thing is that build_cython needs to be responsible for providing a list of resource files it generates (.h files, etc.), but we could free it from responsibility for actually copying them. I don't think it makes a difference either way really but I don't care much at this point either.

comment:20

Let's get this ticket done to support #28752

comment:21

Yeah, I need to revisit this. I don't remember why I was so protective of my original approach, except that I do recall there being a good reason. I just don't know what it was anymore. I know one reason had to do with ensuring that _find_stale_files could work properly, but I don't know for sure that there isn't a simpler way...

comment:22

Ticket retargeted after milestone closed