pybind/pybind11

[BUG] Problem when creating derived Python objects from C++ (inheritance slicing)

mhochsteger opened this issue · 17 comments

EDIT(eric): Adding a tracking list.

  • Enumerate possible solutions and drawbacks, e.g.
    • #2757 Using py::wrapper<> and GC ressurrection
    • Others...
  • Identify best path forward, implement.

Issue description

Hello,
Consider deriving a C++ class in Python and overloading virtual functions, including a 'Clone' function.
When passing a derived Python object to C++ without holding a reference (like calling Clone()), the information about the Python type is lost. In this case the wrong base-class function is called. More complex cases may result in a SEGFAULT.

Is there any way to work around/fix this without causing memory leaks?

Thanks!

Reproducible example code

test.cpp

// test.cpp
#include <pybind11/pybind11.h>
#include <Python.h>

#include <iostream>
#include <memory>

using std::cout;
using std::endl;
using std::unique_ptr;
using std::shared_ptr;
using std::make_shared;

namespace py = pybind11;

class Base {
public:
    virtual void Print() { 
        cout << "Base!" << endl;
    }

    virtual shared_ptr<Base> Clone() {
        return make_shared<Base>();
    }
};

class PyBase : public Base {
public:
    using Base::Base; // Inherit constructors
    void Print() override { PYBIND11_OVERLOAD(void, Base, Print, ); }
    shared_ptr<Base> Clone() override { PYBIND11_OVERLOAD(shared_ptr<Base>, Base, Clone, ); }
};

PYBIND11_MODULE(libtest, m) {
    py::class_<Base, py::wrapper<PyBase>, shared_ptr<Base>>(m, "Base")
      .def(py::init<>())
      ;

    m.def("Test", [] ( shared_ptr<Base> b ) {
          shared_ptr<Base> d = b->Clone();
          // Python object of d is already dead!
          d->Print();  // prints "Base!"
    });
}

test.py

from libtest import *

class Derived(Base):
    def __init__(self):
        super().__init__()

    def Print(self):
        print("Derived!")

    def Clone(self):
        return Derived()

Test(Derived())

This seems a surprisingly hard problem. The main issue is that the C++ Clone and the Python Clone do fundamentally different things: the former returns a new C++ instance, the latter returns a new Python instance. The way referencing works is that the Python instance extends the C++ instance, rather than the other way around, which means casting to the C++ type isn't sufficient to keep the Python instance alive: you can easily end up being the only holder of the C++ instance, as happens here. (I don't think this will segfault: I think overload lookups will simply always fail in such a case since the owning object no longer exists).

The PyBase class could hold a py::object reference to the constructed object (you'd have to expand the PYBIND11_OVERLOAD and change it to save the auto o--which here is py::object), but then you've got a circular reference that can't be automatically cleaned up.

As I said, I don't see an easy way around this.

I worked it around by creating a manager class to hold the reference to the derived Python objects. It's ad hoc and ugly, but since I know exactly when to release the derived Python objects, it works for me.

I think this issue duplicates with #1145 . boost.python doesn't have this issue. I think this is a missing piece in pybind11.

Agreed that it would be nice. (Retitling).

Just curious (for some of my WIP PRs related to #1145): Can I ask if you know of an example boost.python code snippet that shows this working?

Also, do you happen to know where in the boost.python code this case is specifically handled? (preventing type slicing between C++ / Python?)

We are currently use a semi-hacky fork of pybind11 that handles this case for the Underactuated Robotics course / textbook; however, if there is a more elegant solution, I'm all for it.

I am not aware of an open source or self-containing example of it, but pretty sure boost.python works great. I didn’t trace that deep to boost.python to learn its mechanism for that feature. I’d guess it somehow holds the reference to the Python incarnation, since C++ types essentially is different from Python.

Gotcha, thanks!

I've briefly skimmed through the Boost.Python source code for the component I think that's responsible, wrapper<T> + wrapper_base, and I do see self-referencing via PyObject*, but I have not yet seen any reference count increments to increase Python lifetime to match C++:
https://github.com/boostorg/python/blob/7352c9c0f770633e695aa8f48b647aa7a78e49c7/include/boost/python/detail/wrapper_base.hpp#L78
https://github.com/boostorg/python/blob/7352c9c0f770633e695aa8f48b647aa7a78e49c7/include/boost/python/object/pointer_holder.hpp#L200
Will see if I can dig a wee bit more to figure out if it does handle this, and maybe whip up a test case akin to what's in #1146.

Another possible solution was suggested here: #1049 (comment)

I'll close this for now, as the issue is +2 years old. Do reopen if necessary!

I really like this issue in lieu of the other 6 or so issues that we have for this bug. I'm going to reopen this, and close the others.o

@YannickJadoul mentioned a related class of issue (mismatch between Python and C++ instance lifetimes): #1941

Hey - I've just come across this issue in working on OpenSpiel (https://github.com/deepmind/open_spiel)
The workaround I'm going with for now is that when created, the C++ trampoline class owns the Python object that backs it. The trampoline can then be passed around in C++ land without the Python object disappearing.
Then if / when the object is returned to Python, the C++ object releases the reference held to the Python object, after the pybind side has taken ownership of the Python object.

None of this is very pretty in user code, but could presumably be much nicer if incorporated into pybind11.

Howdy Edward! Sorry this bit ya, but that's cool you got it to work! I think that may be along the lines of what Jason mentioned up at the beginning of the thread (#1333 (comment)); also, I had done a similar thing in our fork, but it is a bit kludgy (and required in-place modifications to detect ownership transfer between C++ and Python) - https://github.com/RobotLocomotion/pybind11/blob/58a368ea8e89638e01a7385cad9ce4345d70003d/include/pybind11/pytypes.h#L1517-L1614

Any chance you're able to share a consolidated repro of your workaround in user code?

I've got a potential fix for this in #2839, input/feedback welcome!

rwgk commented

If you're affected by this issue, please check out the smart_holder branch, it is solved there:

https://github.com/pybind/pybind11/blob/smart_holder/README_smart_holder.rst

We've just ran into this issue. Whilst we wait for #2839 (or similar) to fix this, I would be very grateful if one of the pybind gods could take a look at our workaround (iterated from the workaround proposed in #1546) to flag any gotchas (tests so far seem to work).

The basic idea is to derive a PySharedPtr from shared_ptr, and use this new PySharedPtr in the signature of .defed functions where we need to keep alive the Python object.

A custom type_caster for PySharedPtr then returns an aliasing shared_ptr that wraps the py::object lifetime but dereferences to the C++ type.

I believe this works by decoupling the originating shared_ptr of the C++ object and the new shared_ptr passed to C++ functions, using the Python refcount to link the two.

template <class T>
struct PySharedPtr : std::shared_ptr<T> {
  using std::shared_ptr<T>::shared_ptr;
};

namespace pybind11::detail {

template <typename T>
class type_caster<PySharedPtr<T>> : public type_caster_holder<T, std::shared_ptr<T>> {

  PYBIND11_TYPE_CASTER(PySharedPtr<T>, _("PySharedPtr"));

  using BaseCaster = type_caster_holder<T, std::shared_ptr<T>>;

  bool load(pybind11::handle src, bool convert) {
    if (!BaseCaster::load(src, convert)) {
      return false;
    }

    auto pyObj = reinterpret_borrow<object>(src);

    // Construct a shared_ptr to the py::object
    auto pyObjPtr = std::shared_ptr<object>{new object{pyObj}, [](auto pyObjectPtr) {
                                              gil_scoped_acquire gil;
                                              delete pyObjectPtr;
                                            }};
    value = PySharedPtr<T>(pyObjPtr, static_cast<T*>(BaseCaster::value));
    return true;
  }
};
}  // namespace pybind11::detail
rwgk commented

Gotchas... that's a very difficult question. When I worked on the smart_holder implementation, several subtle issues only became known through extensive testing. Anticipating just based on code and imagination is hard.

After only a quick look:

Minor:

class type_caster<PySharedPtr<T>> : public type_caster_holder<T, std::shared_ptr<T>> {

This is more direct (elides a conditional_t that is not actually conditional):

class type_caster<PySharedPtr<T>> : public copyable_holder_caster<T, std::shared_ptr<T>> {

IIUC this workaround doesn't solve #1138, i.e. this will only work with class_<T, std::shared_ptr<T>>, is that a correct understanding?

Why do you prefer a workaround over using the smart_holder branch? — This is for me to understand what's missing.

Thanks for the feedback @rwgk!

  1. Correct, this assumes the originating holder is a shared_ptr and the .defed function wants a shared_ptr from pybind, but also wants to ensure the associated Python instance remains alive and isn't garbage collected because the trampoline needs it.

  2. I would love to use the smart_holder branch. However many of the adopters of our library will be integrating within an application already using pybind, in a rather conservative industry, so using anything other than an official release is probably not acceptable. Even upgrading to a new official release would have to be carefully considered.

rwgk commented

Looking again for a couple minutes, what I see there looks intriguing, but the way I usually convince myself what I have is working as intended in all corner cases, is to write a bunch of unit tests. To be honest (to myself), usually I overlook a thing or two, and only after I have convinced myself (and reviewers) that the unit tests cover all use cases, and the testing passes with all sanitizers I can get my hands on, do I conclude that I'm probably OK, and count on time to prove me right or wrong. In the latter case, the response is to add more unit tests to cover what I overlooked.

You could mine the smart_holder unit tests (mostly test_class_sh_*) for ideas / starting points.

You may also want to look at this analysis: #2672 (comment)

The explicit and reinterpret casts between s<B> and s<D> (shared_ptr base, derived, as defined there) are, to the best of my knowledge, undefined behavior, although evidently in practice that does not matter. Just something you may want to consider in your (company level) decision to not use the smart_holder branch.

in a rather conservative industry, so using anything other than an official release is probably not acceptable

Google "lives at head" as I often hear my colleagues say, i.e. we seem to be at the opposite ends of a spectrum. Which leads to: sorry I don't have the free energy right now that make the smart_holder branch "official" in some way. I'd love to do that sooner rather than later, but realistically, sooner will only happen with external help.