sagemath/sage

Add a separate "build_cython" command to setup.py

embray opened this issue · 67 comments

This is a follow-on to #21600. It adds a new cythonize command to the setup.py, and moves most of the logic for running cythonize() (previously in the customized build_ext) command, into this new command.

The new cythonize command is run always from build_ext anyways, but this has a couple advantages:

  1. It is possible to run just cythonize, without compiling extension modules. This maybe isn't that common, but can be useful for development, especially when testing issues with Cython itself. It will also be useful for generating source distributions, where we might want to ship Cythonized sources in the source tarball, and hence would need to generate them as part of the process of building the tarball (without necessarily compiling).

This also allows passing options directly from the command-line to cythonize. This isn't much taken advantage of currently but the possibility is there.

  1. Code is clearer and more modular. Specific details for compiling cython code are kept separate from details for compiling C/C++ code.

See also upstream discussion of a "generic" implementation: cython/cython#1514

Upstream: Reported upstream. Developers acknowledge bug.

CC: @mkoeppe

Component: build

Keywords: cython distutils

Author: Erik Bray

Branch/Commit: a049f2d

Reviewer: Jeroen Demeyer

Issue created by migration from https://trac.sagemath.org/ticket/21682

comment:1

I have a few other to-do issues in this code, but they're not essential. Not sure if they should be worried about for now, or in a follow-up issue if this code is accepted.

comment:2

I suggest that you first discuss this with upstream cython.

Changed commit from 80955e5 to 487d66a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

487d66aAdd a new 'cythonize' command to setup.py to logically separate out the cythonization step.
comment:4

Rebased--I agree this could also be worth bringing upstream in some form, but do you have any other comments?

comment:6

Replying to @embray:

do you have any other comments?

It would be better to separate the Sage-specific parts from the general code which could be upstreamed.

Changed dependencies from #21600 to none

comment:8

Bikeshed: since all build commands start with build, wouldn't this command better be called build_cython or something like that?

comment:9

Why did you change

os.environ.get('SAGE_DEBUG', None) != 'no'

to

os.environ.get('SAGE_DEBUG', 'no') != 'no'
comment:10

Regarding this:

    # TODO: This works for the C sources themselves, but this logic
    # should also apply to Cython sources--shouldn't the be rebuilt
    # if this file changes?  Alternatively, we could add more details
    # (e.g. cythonize() flags) to the .cython_version file

I implemented that dependency on setup.py for the C files only because Cythonization does not really depend on setup.py.

You are right that the cythonize() flags affect Cythonization, but that is why we have .cython_version. But it would indeed be better if .cython_version would be more auto-generated with the relevant arguments given to cythonize().

comment:11

I would suggest to simplify

        if self.parallel is None:
            self.parallel = int(os.environ.get('SAGE_NUM_THREADS', 0))

        if isinstance(self.parallel, str):
            try:
                self.parallel = int(self.parallel)
            except ValueError:
                raise DistutilsOptionError("parallel should be an integer")

to

        if self.parallel is None:
            self.parallel = os.environ.get('SAGE_NUM_THREADS', 0)

        try:
            self.parallel = int(self.parallel)
        except ValueError:
            raise DistutilsOptionError("parallel should be an integer")
comment:12

In finalize_options, you are setting build_dir unconditionally. Does that mean that the --build-dir option is ignored?

comment:13

Replying to @jdemeyer:

Bikeshed: since all build commands start with build, wouldn't this command better be called build_cython or something like that?

I think that was my first inclination as well. I went with cythonize just to emphasize that this was mostly a wrapper around that, but I think you've convinced me (just by confirming my initial idea) over toward build_cython

comment:14

Replying to @jdemeyer:

Why did you change

os.environ.get('SAGE_DEBUG', None) != 'no'

to

os.environ.get('SAGE_DEBUG', 'no') != 'no'

I don't think I meant to. I think originally it was os.environ.get('SAGE_DEBUG', 'no') and the fact that it's changed now might be a symptom of rebasing. I'll fix that. I definitely like the current spelling better.

comment:15

Replying to @jdemeyer:

I would suggest to simplify

        if self.parallel is None:
            self.parallel = int(os.environ.get('SAGE_NUM_THREADS', 0))

        if isinstance(self.parallel, str):
            try:
                self.parallel = int(self.parallel)
            except ValueError:
                raise DistutilsOptionError("parallel should be an integer")

to

        if self.parallel is None:
            self.parallel = os.environ.get('SAGE_NUM_THREADS', 0)

        try:
            self.parallel = int(self.parallel)
        except ValueError:
            raise DistutilsOptionError("parallel should be an integer")

Agreed

comment:16

Replying to @jdemeyer:

In finalize_options, you are setting build_dir unconditionally. Does that mean that the --build-dir option is ignored?

Yes, IIRC that was in anticipation of fixing #21535, but I left the actual value hard-coded for now so as to not change existing behavior (since currently setting it to anything else will break other things that hard-code SAGE_CYTHONIZED). For the purposes of this ticket as a stand-alone enhancement that is confusing though. I could leave the internal build_dir attribute for now and just get rid of the user option (temporarily).

comment:17

Replying to @embray:

I definitely like the current spelling better.

It's not just a matter of spelling. It is a functional difference.

comment:18

For history about this os.environ.get('SAGE_DEBUG', None), see #13881.

comment:19

I agree, I wasn't being flippant :)

comment:20

Replying to @jdemeyer:

Regarding this:

    # TODO: This works for the C sources themselves, but this logic
    # should also apply to Cython sources--shouldn't the be rebuilt
    # if this file changes?  Alternatively, we could add more details
    # (e.g. cythonize() flags) to the .cython_version file

I implemented that dependency on setup.py for the C files only because Cythonization does not really depend on setup.py.

You are right that the cythonize() flags affect Cythonization, but that is why we have .cython_version. But it would indeed be better if .cython_version would be more auto-generated with the relevant arguments given to cythonize().

What if we made a "version 2" .cython_version (with backwards compat for the old format of course), with a new format in JSON that can accurately reflect the options passed to cythonize()?

comment:21

Replying to @embray:

What if we made a "version 2" .cython_version (with backwards compat for the old format of course), with a new format in JSON that can accurately reflect the options passed to cythonize()?

I think that's a good idea. But that should be an independent ticket. I don't see the need for the "backwards compat for the old format" though.

Upstream: Reported upstream. No feedback yet.

comment:22

Opened an upstream issue at cython/cython#1514

A generic build_cython command could take most any cythonize() arguments as options. A sage-specific wrapper around that would just be setting (or possibly enforcing) certain defaults.

Description changed:

--- 
+++ 
@@ -7,3 +7,5 @@
 This also allows passing options directly from the command-line to `cythonize`.  This isn't much taken advantage of currently but the possibility is there.
 
 2) Code is clearer and more modular.  Specific details for compiling cython code are kept separate from details for compiling C/C++ code.
+
+See also upstream discussion of a "generic" implementation: https://github.com/cython/cython/issues/1514
comment:23

Replying to @jdemeyer:

Replying to @embray:

What if we made a "version 2" .cython_version (with backwards compat for the old format of course), with a new format in JSON that can accurately reflect the options passed to cythonize()?

I think that's a good idea. But that should be an independent ticket. I don't see the need for the "backwards compat for the old format" though.

Agree it's a separate issue. As for backwards-compat, it at least needs to fail gracefully if the existing file is not in the new format, and rebuild + regenerate the file but that's fine.

comment:25

Replying to @jdemeyer:

Why did you change

os.environ.get('SAGE_DEBUG', None) != 'no'

to

os.environ.get('SAGE_DEBUG', 'no') != 'no'

Actually, just to make sure I'm clear about this, is the point that debug should be enabled by default if SAGE_DEBUG is not explicitly set to "no"?

comment:26

Replying to @embray:

Actually, just to make sure I'm clear about this, is the point that debug should be enabled by default if SAGE_DEBUG is not explicitly set to "no"?

Exactly.

comment:27

I'm not sure how I feel about that--it goes against the default for building extension modules which is that the --debug option is not on by default. I'd be happy to change that in Sage's build_* commands, but it should be changed across the board, and a --no-debug option to turn it off is added. But it would be simpler to not make it the default.

Instead, if we want to enable debug by default (which I agree is the best choice for development) we can either add it in the setup.cfg (but it would have to be removed when making a release) or in your local pydistutils.cfg.

Or am I missing something here?

comment:28

(I realize that the "gdb_debug" option for Cythonize and the "debug" option for building C sources are not the same thing, but they also kind of go hand-in-hand).

comment:29

Nevermind--I read the docs for SAGE_DEBUG and that gave me a better understanding. I'll play around a bit more with how the command-line option works, but I agree that the behavior you requested is consistent with how the SAGE_DEBUG environment variable itself should be interpreted.

I think for the build_cython command I'll remove the --debug option entirely, and make gdb_debug=True by default pending the value of SAGE_DEBUG (that is, identical to the previous behavior). There's no huge advantage to having an option to explicitly disable it.

Branch pushed to git repo; I updated commit sha1. New commits:

4ccd15cRename 'cythonize' command to 'build_cython' per [#21682 comment:13](https://github.com/sagemath/sage/issues/21682#comment:13)
15c7661Slightly simpler logic for handling the parallel option
56df3cdTemporarily disable this user option since its value is hard-coded
56695aaRemove --debug option since there's no reason not to produce debugging output unless SAGE_DEBUG=no

Changed commit from 487d66a to 56695aa

comment:31

Replying to @embray:

I'm not sure how I feel about that--it goes against the default for building extension modules which is that the --debug option is not on by default.

In practice, --debug is the default because Python compiles by default with -g (even twice!). The default gcc command line starts with

gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

(Sage adds the -Wno-unused flag to this and removes -Wstrict-prototypes)

Adding the --debug option to distutils adds an additional -g but that doesn't change anything really.

comment:32

Replying to @embray:

There's no huge advantage to having an option to explicitly disable it.

The only potential advantage would be disk space (and maybe build time).

comment:34

I recently did some work on the setup.py for cysignals and I realized that it makes the most sense to run Cython as first build command, before build_py. This is because build_py handles package_data and we may want to handle Cython-generated files as package_data some day (see #20108).

For this reason, in cysignals I decided to run Cython in the base build command.

comment:35

I see--that does make sense. It simplifies things too--we can just insert build_cython into the front of the build sub-commands list.

Branch pushed to git repo; I updated commit sha1. New commits:

a0d57dcSimply insert build_cython in the front of the build sub_commands list.

Changed commit from 56695aa to a0d57dc

Work Issues: rebasing

Changed commit from a0d57dc to aaa6d38

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

4fc2dd6Add a new 'cythonize' command to setup.py to logically separate out the cythonization step.
c647256Rename 'cythonize' command to 'build_cython' per [#21682 comment:13](https://github.com/sagemath/sage/issues/21682#comment:13)
dbb28d4Slightly simpler logic for handling the parallel option
14b70cfTemporarily disable this user option since its value is hard-coded
eb28c9dRemove --debug option since there's no reason not to produce debugging output unless SAGE_DEBUG=no
aaa6d38Simply insert build_cython in the front of the build sub_commands list.
comment:39

Rebased


New commits:

4fc2dd6Add a new 'cythonize' command to setup.py to logically separate out the cythonization step.
c647256Rename 'cythonize' command to 'build_cython' per [#21682 comment:13](https://github.com/sagemath/sage/issues/21682#comment:13)
dbb28d4Slightly simpler logic for handling the parallel option
14b70cfTemporarily disable this user option since its value is hard-coded
eb28c9dRemove --debug option since there's no reason not to produce debugging output unless SAGE_DEBUG=no
aaa6d38Simply insert build_cython in the front of the build sub_commands list.

Changed work issues from rebasing to none

comment:40

Needs rebasing, and should work with #21682

comment:41

You mean #22106?

Dependencies: #22106

comment:42

Yes, I meant #22106. Thanks for the catch.

comment:43

Ah, since #22106 is at least in "positive review" status I should rebase this on it.

Changed commit from aaa6d38 to a7e548b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

3bb5e73Generate auto-generated sources as the earliest step in the 'build' command.
1286549Cosmetic changes
5775788Add a new 'cythonize' command to setup.py to logically separate out the cythonization step.
f9f6d94Rename 'cythonize' command to 'build_cython' per [#21682 comment:13](https://github.com/sagemath/sage/issues/21682#comment:13)
2dd6fe7Slightly simpler logic for handling the parallel option
ac28d2dTemporarily disable this user option since its value is hard-coded
97fadcbRemove --debug option since there's no reason not to produce debugging output unless SAGE_DEBUG=no
a7e548bSimply insert build_cython in the front of the build sub_commands list.
comment:45

Rebased this on top of the current branch for #22106, which should be good to merge it looks like.

Other than the rebasing I don't think this needed any additional work.

comment:47

Merged with 8.0.beta0


New commits:

cdb2f89Merge tag '8.0.beta0' into t/21682/cythonize-command

Changed dependencies from #22106 to none

Changed commit from a7e548b to cdb2f89

Branch pushed to git repo; I updated commit sha1. New commits:

327108aAdd missing "mkpath"

Changed commit from cdb2f89 to 327108a

Changed upstream from Reported upstream. No feedback yet. to Reported upstream. Developers acknowledge bug.

Changed commit from 327108a to e3465ea

Branch pushed to git repo; I updated commit sha1. New commits:

e3465eaNo need to use build_ext to get the list of extensions

Reviewer: Jeroen Demeyer

comment:51

Erik, I'm happy with this. If you like my changes, you can set this to positive_review.

comment:52

I'm happy with the last change except in the sage_build_cython.initialize_options it should also set self.extensions = None. I know, it's a bit silly, but I like to stick to the convention on this. (The reason for this convention has to do with how distutils commands can set their options from options on other commands, i.e. self.set_undefined_options. No, it's not likely to be used this way here, but I do also like that documents what attributes should be set on the finalized command.)

Changed commit from e3465ea to a049f2d

Branch pushed to git repo; I updated commit sha1. New commits:

a049f2dSet self.extensions in initialize_options()
comment:55

Thanks!

Changed branch from u/jdemeyer/cythonize-command to a049f2d