Fix Cython tracebacks on Python 3
Closed this issue · 56 comments
I've been getting to the bottom of why tracebacks aren't displaying Cython sources on Python 3. The problem turns out to be multi-faceted, and fixing it right would also mean fixing some things in general, since the way it works now is a hack.
Basically, the only reason it works at all on Python 2, is that the linecache module on Python 2 has some code which, given a relative filename (with an extension unrecognized by the import system) like sage/rings/integer.pyx, it will search for this file under all sys.path entries and, if found, read the source lines from that file.
This, in turn, only works for Sage because we actually install the .pyx sources in the sage package.
This does not work in Python 3. In linecache.updatecache, before it tries the sys.path search, it checks if the module object has a __loader__, and calls its get_source() method if it exists. On Python 2 this isn't a problem since modules don't necessarily have a __loader__, and in particular extension modules don't. But on the reworked import system in Python 3, pretty much every module has a __loader__--in the case of extension modules an ExtensionFileLoader. But the built-in ExtensionFileLoader of course knows nothing about Cython so its get_source() method just returns None. linecache.updatecache assumes this is correct (why would the loader lie?) and returns.
The simplest way to fix this is to remove the get_source() method from the ExtensionFileLoader class. This way, Python 3 works the same way as Python 2.
Upstream:
Upstream: Reported upstream. No feedback yet.
Component: cython
Author: Jeroen Demeyer, Erik Bray
Branch: 1a9225f
Reviewer: Frédéric Chapoton
Issue created by migration from https://trac.sagemath.org/ticket/24681
Description changed:
---
+++
@@ -8,6 +8,6 @@
Now, clearly there are a number of different ways we could hack around this at different levels, possibly involving monkey-patching either the `inspect` module or the `linecache` module (we already do the former). But it might be nice to fix this more systematically.
-First of all, it comes as a bit of a surprise to me that there's no option in Cython to include a copy of a Cython module's pyx source in the compile module (e.g. accessible via a `__cython_source__` attribute or something like that). I think that could be a nice feature to have for introspection purposes (e.g. it would also make getting the source from a inline-compiled Cython code easier). If we wanted to do this it would be easy to do even without patching Cython, but it might be a nice feature to upstream. Though I'd be surprised if this hasn't been suggested before, which makes me wonder if/why it was rejected. The only reason I can think of from the Sage perspective is bloat. Sage's Cython sources are nearly 10MB, but that's still nothing by modern standards, and the current situation requires having the sources anyways.
+First of all, it comes as a bit of a surprise to me that there's no option in Cython to include a copy of a Cython module's pyx source in the compile module (e.g. accessible via a `__cython_source__` attribute or something like that). I think that could be a nice feature to have for introspection purposes (e.g. it would also make getting the source from a inline-compiled Cython code easier). If we wanted to do this it might be possible to do even without patching Cython, but it might be a nice feature to upstream. Though I'd be surprised if this hasn't been suggested before, which makes me wonder if/why it was rejected. The only reason I can think of from the Sage perspective is bloat. Sage's Cython sources are nearly 10MB, but that's still nothing by modern standards, and the current situation requires having the sources anyways.
The second element to this would be a `sys.path_hooks` entry that handles extension modules but, instead of returning `ExtensionFileLoader` returns a simple subclass thereof that knows how to recognize and handle Cython modules.Here's a Python 3 only rough prototype of how this might work, in this case via simply installing the .pyx files alongside the compiled modules. Doing anything like inlining the source code in the compiled modules looks like it will be trickier than I thought without some modification to Cython.
This instead recognizes Cython modules by looking for .pyx files associated with extension modules. It then wraps such modules with a CythonModule class which has a __cython_source__ property which can read in the sources. Unfortunately this doesn't yet handle reading sources from .pxd files, but that could probably be added with a bit of elbow grease.
Unfortunately, about 80% of this patch is some very general machinery that I needed just to get Python's import system to handle a new module suffix (in fact, this would even be necessary to override the handling of already known suffixes). This used to be easier to do in Python 2--I've done it before relatively painlessly. But the new import system made that otherwise simple use case much more difficult--I've raised this issue with the python-ideas mailing list. In the meantime my workaround is fine, but an added complication I would have wished to have avoided...
New commits:
3388b66 | Rough sketch of supporting Cython file sources by way of installing the |
Branch pushed to git repo; I updated commit sha1. New commits:
8b63abe | Add get_source for Cython modules. This is the last bit needed to make tracebacks work properly again. |
I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.
From your description, it seems that the simplest solution would be to add a custom __loader__ for Cython modules.
New commits:
8b63abe | Add get_source for Cython modules. This is the last bit needed to make tracebacks work properly again. |
Replying to @embray:
In
linecache.updatecache, before it tries thesys.pathsearch, it checks if the module object has a__loader__
Which module object? How does the mapping traceback -> module work?
Replying to @jdemeyer:
I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.
From your description, it seems that the simplest solution would be to add a custom
__loader__for Cython modules.
I haven't tried it yet, but that might work if the __loader__ were just pre-baked into the extension module. It would definitely be preferable to the import system hacks here. I'll probably give that a try, but I wanted to see if we could make it work without patching Cython first.
Replying to @jdemeyer:
Replying to @embray:
In
linecache.updatecache, before it tries thesys.pathsearch, it checks if the module object has a__loader__Which module object? How does the mapping traceback -> module work?
Ultimately inspect.findsource, which is used in formatting the traceback, goes through linecache to get the source lines for a file. Well, if you look at how linecache.updatecache is implemented, it's funny--if it's passed a real filename then it will just read the source code directly out of the filename. The filename, in this case, is coming from the traceback frame's code object. However, as in our old friend #24097, since the Cython module's hard-coded path is a relative path, this actually fails unless you happen to be in the right directory. It then goes on to try going through the module's __loader__ if it has one...
Replying to @embray:
I wanted to see if we could make it work without patching Cython first.
Why? If we can do it with just a Cython patch, that would be ideal because then everybody benefits, not just Sage.
FYI: the traceback frame for Cython code is manually constructed by Cython itself (exceptions from C code typically have no "frame"). So if anything needs to be changed there, that can easily be done.
Replying to @jdemeyer:
Replying to @embray:
I wanted to see if we could make it work without patching Cython first.
Why? If we can do it with just a Cython patch, that would be ideal because then everybody benefits, not just Sage.
Oh, agree completely. I just didn't want to, as a first pass, depend on one without knowing when such a patch would be available in an official release. Or, at the time of writing, I wasn't even sure if it was a good idea (just because it surprised me that such a feature didn't already exit).
Replying to @embray:
Replying to @jdemeyer:
I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.
From your description, it seems that the simplest solution would be to add a custom
__loader__for Cython modules.I haven't tried it yet, but that might work if the
__loader__were just pre-baked into the extension module. It would definitely be preferable to the import system hacks here. I'll probably give that a try, but I wanted to see if we could make it work without patching Cython first.
In retrospect, I don't know if this will work after all. It's the import system itself that constructs module objects and assigns to their __loader__ attribute (with the loader that was used to load the module code in the first place). So I do't know that it's possible for a module to come with __loader__ already in its namespace and not just have it overridden. So it might be necessary to have some import hooks after all if we wanted to provide a custom loader for Cython modules.
That said, it would still be possible to provide better introspection of Cython modules if the Cython sources were (optionally) baked into the module somewhere, and that's a separate issue and easy to add to Cython.
Replying to @jdemeyer:
FYI: the traceback frame for Cython code is manually constructed by Cython itself (exceptions from C code typically have no "frame"). So if anything needs to be changed there, that can easily be done.
No, I don't think anything needs to be done there (modulo the issue about filename paths, but that's somewhat orthogonal).
Replying to @embray:
In retrospect, I don't know if this will work after all. It's the import system itself that constructs
moduleobjects and assigns to their__loader__attribute (with the loader that was used to load the module code in the first place). So I do't know that it's possible for a module to come with__loader__already in its namespace and not just have it overridden.
I'll have a look. Isn't the __loader__ just an entry in the module dict? Surely that dict can be changed.
Simple fix which works for me on a sample project:
del globals()["__loader__"]
It still requires that the sources can be found in sys.path.
I'm not saying that this is the perfect solution, but it gives a starting point that playing with __loader__ works. Now I'll try to implement a custom __loader__.
To my surprise, the import system actually does not, by default override __loader__ if it's already in the module: https://github.com/python/cpython/blob/3.6/Lib/importlib/_bootstrap.py#L504
So I guess you could just put whatever you want in a module's __loader__ attribute. That is, unless that function is called with override=True, and it's not entirely clear to me in what context that would happen, or if this behavior is even documented. In other words, I'm not sure if there's well-defined behavior for this.
Got it to work!
def error():
raise RuntimeError("traceback testing")
from importlib.machinery import ExtensionFileLoader
class CythonExtensionLoader(ExtensionFileLoader):
def __init__(self, name):
self.name = name
@property
def path(self):
return __file__
def get_source(self, fullname):
print("get_source({}) called".format(fullname))
with open("/usr/local/src/sage-config/local/src/cyloader/cyloader/loader.pyx") as f:
return f.read()
__loader__ = CythonExtensionLoader(__name__)
By setting a Cython-aware __loader__ when the module is initialized, we can fix its get_source method to do what we want. Calling error() does display the source in the traceback.
I see, there is actually some better documentation on this here: https://docs.python.org/3/reference/import.html?highlight=_init_module_attrs#loading
All the attributes set by the import system are set before the module is actually executed. When the module is executed it can, of course, put whatever it wants in its own namespace, including overriding anything the import system put there.
Nice! I wouldn't have thought that would work, but I'm glad it does. A real implementation might also consider wrapping whatever __loader__ is already there.
Now it would be nice to add a flag to Cython to include the source code in the module, but in the meantime my prototype, which just installs the .pyx files alongside the extension modules, could also work.
Replying to @embray:
All the attributes set by the import system are set before the module is actually executed.
No, that's not the case. In my first attempt, I tried to simply change the __class__ of the existing __loader__. But there is no existing __loader__ when the extension module is being initialized.
PEP 489 is certainly relevant here, but Cython does not fully support that yet.
Hmm, my solution only works in IPython so far, not in plain Python.
The Python REPL uses the default excepthook to display exceptions. When it cannot find the file, it does something strange: it essentially tries os.path.join(path, basename) where path loops over the sys.path entries and basename is the last part(!) of the filename in the traceback. This goes back to
commit 7169dbb76dcd1d25f2989eb00da33d05c3d6279b
Author: Guido van Rossum <guido@python.org>
Date: Wed Feb 26 15:17:59 1992 +0000
Move printing of filename and lineno to tb_displayline.
Search sys.path if the filename isn't found.
Include osdefs.h.
But I guess we care more IPython than Python, so I'm just going to ignore the plain Python REPL for now.
And the overriding of __loader__ won't work with pxd files: there is no way for the get_source() method to know the filename that it should return the sources for.
OK, new and super-simple solution which just reverts the Python 3 behaviour to the Python 2 behaviour:
class DeletedAttribute(object):
def __get__(self, instance, owner):
raise AttributeError
from importlib.machinery import ExtensionFileLoader
ExtensionFileLoader.get_source = DeletedAttribute()
It effectively removes the get_source attribute from ExtensionFileLoader. This causes the linecache module to continuing searching for the source file as in Python 2.
Changed branch from u/embray/python3/cython-source-prototype to u/jdemeyer/python3/cython-source-prototype
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
1a9225f | Fix Cython tracebacks on Python 3 |
Description changed:
---
+++
@@ -6,8 +6,4 @@
That said, that still wouldn't fix things for Python 3. In `linecache.updatecache`, before it tries the `sys.path` search, it checks if the module object has a `__loader__`, and calls its `get_source()` method if it exists. On Python 2 this isn't a problem since modules don't necessarily have a `__loader__`, and in particular extension modules don't. But on the reworked import system in Python 3, pretty much every module has a `__loader__`--in the case of extension modules an `ExtensionFileLoader`. But the built-in `ExtensionFileLoader` of course knows nothing about Cython so its `get_source()` method just returns `None`. `linecache.updatecache` assumes this is correct (why would the loader lie?) and returns.
-Now, clearly there are a number of different ways we could hack around this at different levels, possibly involving monkey-patching either the `inspect` module or the `linecache` module (we already do the former). But it might be nice to fix this more systematically.
-
-First of all, it comes as a bit of a surprise to me that there's no option in Cython to include a copy of a Cython module's pyx source in the compile module (e.g. accessible via a `__cython_source__` attribute or something like that). I think that could be a nice feature to have for introspection purposes (e.g. it would also make getting the source from a inline-compiled Cython code easier). If we wanted to do this it might be possible to do even without patching Cython, but it might be a nice feature to upstream. Though I'd be surprised if this hasn't been suggested before, which makes me wonder if/why it was rejected. The only reason I can think of from the Sage perspective is bloat. Sage's Cython sources are nearly 10MB, but that's still nothing by modern standards, and the current situation requires having the sources anyways.
-
-The second element to this would be a `sys.path_hooks` entry that handles extension modules but, instead of returning `ExtensionFileLoader` returns a simple subclass thereof that knows how to recognize and handle Cython modules.
+The simplest way to fix this is to remove the `get_source()` method from the `ExtensionFileLoader` class. This way, Python 3 works the same way as Python 2.Author: Jeroen Demeyer, Erik Bray
The approach with the custom __loader__ can never work because it cannot differentiate between the .pyx file and other files. It assumes that one module has one source file, which is generally not the case for Cython.
So I propose this simple but effective solution of monkey-patching ExtensionFileLoader.
Description changed:
---
+++
@@ -7,3 +7,5 @@
That said, that still wouldn't fix things for Python 3. In `linecache.updatecache`, before it tries the `sys.path` search, it checks if the module object has a `__loader__`, and calls its `get_source()` method if it exists. On Python 2 this isn't a problem since modules don't necessarily have a `__loader__`, and in particular extension modules don't. But on the reworked import system in Python 3, pretty much every module has a `__loader__`--in the case of extension modules an `ExtensionFileLoader`. But the built-in `ExtensionFileLoader` of course knows nothing about Cython so its `get_source()` method just returns `None`. `linecache.updatecache` assumes this is correct (why would the loader lie?) and returns.
The simplest way to fix this is to remove the `get_source()` method from the `ExtensionFileLoader` class. This way, Python 3 works the same way as Python 2.
+
+**Upstream**: https://bugs.python.org/issue32797Upstream: Reported upstream. No feedback yet.
Regardless of the upstream Python or Cython bug, could you accept this as a working-but-improvable solution?
No, I'm -1 on this, I don't think it's a good solution even if it happens to work. If this were urgent I'd say "sure", but we have time and flexibility here to come up with a better solution.
It is an interesting question concerning the split of some modules between a .pyx and a .pxd. In most cases only the .pyx is relevant or interesting, but some functions are implemented in the .pxd file. This is more rare, however.
Replying to @embray:
No, I'm -1 on this, I don't think it's a good solution even if it happens to work.
Can you at least say why you think that it's not a good solution?
I guess that you would like to use __loader__.get_source() but that won't work, see [comment:28].
I mean it's obviously a hack, and an ugly one at that. This also doesn't solve the issue of multiple source files any better. I think we really need to think about how to handle that case well--I think it will require rethinking some things in Cython.
That said, having even just a hack for this is could still be very useful for debugging issues in Python 3, so I've changed my mind and don't really have a problem giving this a try for now as long as we open a new ticket to track this issue (or leave this ticket open and create a new ticket for the workaround).
Replying to @embray:
This also doesn't solve the issue of multiple source files any better.
In what way does it not solve that problem? The whole reason not to use __loader__.get_source() is precisely to support multiple source files from the same module.
I must be missing something, but how do we support that case currently? E.g. if some module has a .pxd and a .pyx, and some function in that module is implemented in one of those source files but not the other, how do we currently load the source for that function from the correct source file?
Replying to @embray:
how do we currently load the source for that function from the correct source file?
The traceback contains the source filename:
sage: try:
....: 1/0
....: except ZeroDivisionError as E:
....: tb = sys.exc_info()[2]
sage: tb.tb_next.tb_frame.f_code.co_filename
'sage/rings/integer.pyx'
Replying to @embray:
No, I'm -1 on this, I don't think it's a good solution even if it happens to work.
An ugly/hackish solution is still better than no solution at all.
I simply don't see a better solution, not even in theory.
Replying to @jdemeyer:
Replying to @embray:
how do we currently load the source for that function from the correct source file?
The traceback contains the source filename:
I see--that's those fake Cython hacked frames with fake code objects. I didn't realize that would include the correct filename even for functions defined in .pxd file but apparently it does.
An ugly/hackish solution is still better than no solution at all.
I agree, as long as we open a new ticket--either a separate ticket from this one for the hack solution, or a new ticket for tracking the issue (I'd prefer the former since closing this ticket means closing some valuable discussion).
I simply don't see a better solution, not even in theory.
I still have some ideas about this, but it would require modification to Cython for sure.
Additionally (and much longer term) a very large part of the problem is the Python Loader API's assumption that a module has a single source file. It seems the assumption is that "for extension modules sure there may be many source files, but it's not useful for Python-level introspection to be able to return them". Sure, for normal extension modules. But the Cython case obviously disproves that in general. I think this is worth taking up but that's not going to help anytime soon.
Replying to @embray:
I still have some ideas about this, but it would require modification to Cython for sure.
It is possible to explain what you are thinking of?
I could try to explain but it's still a bit hand-wavey at the moment. It also comes back around to the issue of subclassing the function type.
Description changed:
---
+++
@@ -2,9 +2,9 @@
Basically, the only reason it works at all on Python 2, is that the `linecache` module on Python 2 has some code which, given a relative filename (with an extension unrecognized by the import system) like `sage/rings/integer.pyx`, it will search for this file under all `sys.path` entries and, if found, read the source lines from that file.
-This, in turn, only works for Sage because a) We tend to keep the Sage source code around via `SAGE_SRC`--itself kind of a hack. If Sage were installed like a normal Python package you wouldn't have this. Instead we append `SAGE_SRC` to `sys.path` when importing Sage. This works, but it's not something we *really* want to be doing. A slightly better alternative might be to actually install the `.pyx` sources in the sage package.
+This, in turn, only works for Sage because we actually install the `.pyx` sources in the sage package.
-That said, that still wouldn't fix things for Python 3. In `linecache.updatecache`, before it tries the `sys.path` search, it checks if the module object has a `__loader__`, and calls its `get_source()` method if it exists. On Python 2 this isn't a problem since modules don't necessarily have a `__loader__`, and in particular extension modules don't. But on the reworked import system in Python 3, pretty much every module has a `__loader__`--in the case of extension modules an `ExtensionFileLoader`. But the built-in `ExtensionFileLoader` of course knows nothing about Cython so its `get_source()` method just returns `None`. `linecache.updatecache` assumes this is correct (why would the loader lie?) and returns.
+This does not work in Python 3. In `linecache.updatecache`, before it tries the `sys.path` search, it checks if the module object has a `__loader__`, and calls its `get_source()` method if it exists. On Python 2 this isn't a problem since modules don't necessarily have a `__loader__`, and in particular extension modules don't. But on the reworked import system in Python 3, pretty much every module has a `__loader__`--in the case of extension modules an `ExtensionFileLoader`. But the built-in `ExtensionFileLoader` of course knows nothing about Cython so its `get_source()` method just returns `None`. `linecache.updatecache` assumes this is correct (why would the loader lie?) and returns.
The simplest way to fix this is to remove the `get_source()` method from the `ExtensionFileLoader` class. This way, Python 3 works the same way as Python 2.
Ping? Having this in Sage would help with the Python 3 porting effort...
If you prefer instead to add the Python patch from python/cpython#6653 that would obviously be fine for me too.
Description changed:
---
+++
@@ -8,4 +8,7 @@
The simplest way to fix this is to remove the `get_source()` method from the `ExtensionFileLoader` class. This way, Python 3 works the same way as Python 2.
-**Upstream**: https://bugs.python.org/issue32797
+**Upstream**:
+
+- https://bugs.python.org/issue32797
+- https://github.com/python/cpython/pull/6653Reviewer: Frédéric Chapoton
Bot is green.. ok, let it be.
Changed branch from u/jdemeyer/python3/cython-source-prototype to 1a9225f