Add a separate "build_cython" command to setup.py
embray opened this issue · 67 comments
This is a follow-on to #21600. It adds a new cythonize command to the setup.py, and moves most of the logic for running cythonize() (previously in the customized build_ext) command, into this new command.
The new cythonize command is run always from build_ext anyways, but this has a couple advantages:
- It is possible to run just
cythonize, without compiling extension modules. This maybe isn't that common, but can be useful for development, especially when testing issues with Cython itself. It will also be useful for generating source distributions, where we might want to ship Cythonized sources in the source tarball, and hence would need to generate them as part of the process of building the tarball (without necessarily compiling).
This also allows passing options directly from the command-line to cythonize. This isn't much taken advantage of currently but the possibility is there.
- Code is clearer and more modular. Specific details for compiling cython code are kept separate from details for compiling C/C++ code.
See also upstream discussion of a "generic" implementation: cython/cython#1514
Upstream: Reported upstream. Developers acknowledge bug.
CC: @mkoeppe
Component: build
Keywords: cython distutils
Author: Erik Bray
Branch/Commit: a049f2d
Reviewer: Jeroen Demeyer
Issue created by migration from https://trac.sagemath.org/ticket/21682
I have a few other to-do issues in this code, but they're not essential. Not sure if they should be worried about for now, or in a follow-up issue if this code is accepted.
I suggest that you first discuss this with upstream cython.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
487d66a | Add a new 'cythonize' command to setup.py to logically separate out the cythonization step. |
Rebased--I agree this could also be worth bringing upstream in some form, but do you have any other comments?
Replying to @embray:
do you have any other comments?
It would be better to separate the Sage-specific parts from the general code which could be upstreamed.
Bikeshed: since all build commands start with build, wouldn't this command better be called build_cython or something like that?
Why did you change
os.environ.get('SAGE_DEBUG', None) != 'no'
to
os.environ.get('SAGE_DEBUG', 'no') != 'no'
Regarding this:
# TODO: This works for the C sources themselves, but this logic
# should also apply to Cython sources--shouldn't the be rebuilt
# if this file changes? Alternatively, we could add more details
# (e.g. cythonize() flags) to the .cython_version file
I implemented that dependency on setup.py for the C files only because Cythonization does not really depend on setup.py.
You are right that the cythonize() flags affect Cythonization, but that is why we have .cython_version. But it would indeed be better if .cython_version would be more auto-generated with the relevant arguments given to cythonize().
I would suggest to simplify
if self.parallel is None:
self.parallel = int(os.environ.get('SAGE_NUM_THREADS', 0))
if isinstance(self.parallel, str):
try:
self.parallel = int(self.parallel)
except ValueError:
raise DistutilsOptionError("parallel should be an integer")
to
if self.parallel is None:
self.parallel = os.environ.get('SAGE_NUM_THREADS', 0)
try:
self.parallel = int(self.parallel)
except ValueError:
raise DistutilsOptionError("parallel should be an integer")
In finalize_options, you are setting build_dir unconditionally. Does that mean that the --build-dir option is ignored?
Replying to @jdemeyer:
Bikeshed: since all
buildcommands start withbuild, wouldn't this command better be calledbuild_cythonor something like that?
I think that was my first inclination as well. I went with cythonize just to emphasize that this was mostly a wrapper around that, but I think you've convinced me (just by confirming my initial idea) over toward build_cython
Replying to @jdemeyer:
Why did you change
os.environ.get('SAGE_DEBUG', None) != 'no'to
os.environ.get('SAGE_DEBUG', 'no') != 'no'
I don't think I meant to. I think originally it was os.environ.get('SAGE_DEBUG', 'no') and the fact that it's changed now might be a symptom of rebasing. I'll fix that. I definitely like the current spelling better.
Replying to @jdemeyer:
I would suggest to simplify
if self.parallel is None: self.parallel = int(os.environ.get('SAGE_NUM_THREADS', 0)) if isinstance(self.parallel, str): try: self.parallel = int(self.parallel) except ValueError: raise DistutilsOptionError("parallel should be an integer")to
if self.parallel is None: self.parallel = os.environ.get('SAGE_NUM_THREADS', 0) try: self.parallel = int(self.parallel) except ValueError: raise DistutilsOptionError("parallel should be an integer")
Agreed
Replying to @jdemeyer:
In
finalize_options, you are settingbuild_dirunconditionally. Does that mean that the--build-diroption is ignored?
Yes, IIRC that was in anticipation of fixing #21535, but I left the actual value hard-coded for now so as to not change existing behavior (since currently setting it to anything else will break other things that hard-code SAGE_CYTHONIZED). For the purposes of this ticket as a stand-alone enhancement that is confusing though. I could leave the internal build_dir attribute for now and just get rid of the user option (temporarily).
Replying to @embray:
I definitely like the current spelling better.
It's not just a matter of spelling. It is a functional difference.
For history about this os.environ.get('SAGE_DEBUG', None), see #13881.
I agree, I wasn't being flippant :)
Replying to @jdemeyer:
Regarding this:
# TODO: This works for the C sources themselves, but this logic # should also apply to Cython sources--shouldn't the be rebuilt # if this file changes? Alternatively, we could add more details # (e.g. cythonize() flags) to the .cython_version fileI implemented that dependency on
setup.pyfor the C files only because Cythonization does not really depend onsetup.py.You are right that the
cythonize()flags affect Cythonization, but that is why we have.cython_version. But it would indeed be better if.cython_versionwould be more auto-generated with the relevant arguments given tocythonize().
What if we made a "version 2" .cython_version (with backwards compat for the old format of course), with a new format in JSON that can accurately reflect the options passed to cythonize()?
Replying to @embray:
What if we made a "version 2"
.cython_version(with backwards compat for the old format of course), with a new format in JSON that can accurately reflect the options passed tocythonize()?
I think that's a good idea. But that should be an independent ticket. I don't see the need for the "backwards compat for the old format" though.
Upstream: Reported upstream. No feedback yet.
Opened an upstream issue at cython/cython#1514
A generic build_cython command could take most any cythonize() arguments as options. A sage-specific wrapper around that would just be setting (or possibly enforcing) certain defaults.
Description changed:
---
+++
@@ -7,3 +7,5 @@
This also allows passing options directly from the command-line to `cythonize`. This isn't much taken advantage of currently but the possibility is there.
2) Code is clearer and more modular. Specific details for compiling cython code are kept separate from details for compiling C/C++ code.
+
+See also upstream discussion of a "generic" implementation: https://github.com/cython/cython/issues/1514Replying to @jdemeyer:
Replying to @embray:
What if we made a "version 2"
.cython_version(with backwards compat for the old format of course), with a new format in JSON that can accurately reflect the options passed tocythonize()?I think that's a good idea. But that should be an independent ticket. I don't see the need for the "backwards compat for the old format" though.
Agree it's a separate issue. As for backwards-compat, it at least needs to fail gracefully if the existing file is not in the new format, and rebuild + regenerate the file but that's fine.
Replying to @jdemeyer:
Why did you change
os.environ.get('SAGE_DEBUG', None) != 'no'to
os.environ.get('SAGE_DEBUG', 'no') != 'no'
Actually, just to make sure I'm clear about this, is the point that debug should be enabled by default if SAGE_DEBUG is not explicitly set to "no"?
Replying to @embray:
Actually, just to make sure I'm clear about this, is the point that debug should be enabled by default if SAGE_DEBUG is not explicitly set to "no"?
Exactly.
I'm not sure how I feel about that--it goes against the default for building extension modules which is that the --debug option is not on by default. I'd be happy to change that in Sage's build_* commands, but it should be changed across the board, and a --no-debug option to turn it off is added. But it would be simpler to not make it the default.
Instead, if we want to enable debug by default (which I agree is the best choice for development) we can either add it in the setup.cfg (but it would have to be removed when making a release) or in your local pydistutils.cfg.
Or am I missing something here?
(I realize that the "gdb_debug" option for Cythonize and the "debug" option for building C sources are not the same thing, but they also kind of go hand-in-hand).
Nevermind--I read the docs for SAGE_DEBUG and that gave me a better understanding. I'll play around a bit more with how the command-line option works, but I agree that the behavior you requested is consistent with how the SAGE_DEBUG environment variable itself should be interpreted.
I think for the build_cython command I'll remove the --debug option entirely, and make gdb_debug=True by default pending the value of SAGE_DEBUG (that is, identical to the previous behavior). There's no huge advantage to having an option to explicitly disable it.
Branch pushed to git repo; I updated commit sha1. New commits:
4ccd15c | Rename 'cythonize' command to 'build_cython' per [#21682 comment:13](https://github.com/sagemath/sage/issues/21682#comment:13) |
15c7661 | Slightly simpler logic for handling the parallel option |
56df3cd | Temporarily disable this user option since its value is hard-coded |
56695aa | Remove --debug option since there's no reason not to produce debugging output unless SAGE_DEBUG=no |
Replying to @embray:
I'm not sure how I feel about that--it goes against the default for building extension modules which is that the
--debugoption is not on by default.
In practice, --debug is the default because Python compiles by default with -g (even twice!). The default gcc command line starts with
gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC
(Sage adds the -Wno-unused flag to this and removes -Wstrict-prototypes)
Adding the --debug option to distutils adds an additional -g but that doesn't change anything really.
Replying to @embray:
There's no huge advantage to having an option to explicitly disable it.
The only potential advantage would be disk space (and maybe build time).
I recently did some work on the setup.py for cysignals and I realized that it makes the most sense to run Cython as first build command, before build_py. This is because build_py handles package_data and we may want to handle Cython-generated files as package_data some day (see #20108).
For this reason, in cysignals I decided to run Cython in the base build command.
I see--that does make sense. It simplifies things too--we can just insert build_cython into the front of the build sub-commands list.
Branch pushed to git repo; I updated commit sha1. New commits:
a0d57dc | Simply insert build_cython in the front of the build sub_commands list. |
Work Issues: rebasing
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
4fc2dd6 | Add a new 'cythonize' command to setup.py to logically separate out the cythonization step. |
c647256 | Rename 'cythonize' command to 'build_cython' per [#21682 comment:13](https://github.com/sagemath/sage/issues/21682#comment:13) |
dbb28d4 | Slightly simpler logic for handling the parallel option |
14b70cf | Temporarily disable this user option since its value is hard-coded |
eb28c9d | Remove --debug option since there's no reason not to produce debugging output unless SAGE_DEBUG=no |
aaa6d38 | Simply insert build_cython in the front of the build sub_commands list. |
Rebased
New commits:
4fc2dd6 | Add a new 'cythonize' command to setup.py to logically separate out the cythonization step. |
c647256 | Rename 'cythonize' command to 'build_cython' per [#21682 comment:13](https://github.com/sagemath/sage/issues/21682#comment:13) |
dbb28d4 | Slightly simpler logic for handling the parallel option |
14b70cf | Temporarily disable this user option since its value is hard-coded |
eb28c9d | Remove --debug option since there's no reason not to produce debugging output unless SAGE_DEBUG=no |
aaa6d38 | Simply insert build_cython in the front of the build sub_commands list. |
Changed work issues from rebasing to none
Ah, since #22106 is at least in "positive review" status I should rebase this on it.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
3bb5e73 | Generate auto-generated sources as the earliest step in the 'build' command. |
1286549 | Cosmetic changes |
5775788 | Add a new 'cythonize' command to setup.py to logically separate out the cythonization step. |
f9f6d94 | Rename 'cythonize' command to 'build_cython' per [#21682 comment:13](https://github.com/sagemath/sage/issues/21682#comment:13) |
2dd6fe7 | Slightly simpler logic for handling the parallel option |
ac28d2d | Temporarily disable this user option since its value is hard-coded |
97fadcb | Remove --debug option since there's no reason not to produce debugging output unless SAGE_DEBUG=no |
a7e548b | Simply insert build_cython in the front of the build sub_commands list. |
Rebased this on top of the current branch for #22106, which should be good to merge it looks like.
Other than the rebasing I don't think this needed any additional work.
Changed branch from u/embray/cythonize-command to u/jdemeyer/cythonize-command
Merged with 8.0.beta0
New commits:
cdb2f89 | Merge tag '8.0.beta0' into t/21682/cythonize-command |
Branch pushed to git repo; I updated commit sha1. New commits:
327108a | Add missing "mkpath" |
Changed upstream from Reported upstream. No feedback yet. to Reported upstream. Developers acknowledge bug.
Branch pushed to git repo; I updated commit sha1. New commits:
e3465ea | No need to use build_ext to get the list of extensions |
Reviewer: Jeroen Demeyer
Erik, I'm happy with this. If you like my changes, you can set this to positive_review.
I'm happy with the last change except in the sage_build_cython.initialize_options it should also set self.extensions = None. I know, it's a bit silly, but I like to stick to the convention on this. (The reason for this convention has to do with how distutils commands can set their options from options on other commands, i.e. self.set_undefined_options. No, it's not likely to be used this way here, but I do also like that documents what attributes should be set on the finalized command.)
Branch pushed to git repo; I updated commit sha1. New commits:
a049f2d | Set self.extensions in initialize_options() |
Thanks!
Changed branch from u/jdemeyer/cythonize-command to a049f2d