Skip to content

[BUG]: Recursive dispatch fails to find python function override. #3357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
dyershov opened this issue Oct 12, 2021 · 10 comments
Open
3 tasks done

[BUG]: Recursive dispatch fails to find python function override. #3357

dyershov opened this issue Oct 12, 2021 · 10 comments
Labels

Comments

@dyershov
Copy link
Contributor

dyershov commented Oct 12, 2021

Required prerequisites

Problem description

Consider a simple C++ visitor pattern:

class Data{};
using DataVisitor = std::function<void (const Data&)>;

class Adder {
public:
    // interface to add two data objects, the result is passes to data visitor
    virtual void operator()(const Data& first, const Data& second, const DataVisitor& visitor) const = 0;
};

void add(const Data& first, const Data& second,  const Adder& adder, const DataVisitor& visitor) {
    adder(first, second, visitor);
}

void add3(const Data& first, const Data& second, const Data& third,  const Adder& adder, const DataVisitor& visitor) {
    adder(first, second, [&] (const Data& first_plus_second) {
        adder(first_plus_second, third, visitor);
    });
}

In this pattern, Data and Adder are abstract classes which can be used in two algorithms add and add3 which, as the name suggest, add two or three objects of the type Data and send the result to a visitor. This pattern is intended to eliminate allocating memory for Data pointer in case we need to store temporary results, e.g., see add3 algorithm.

Next, we define the corresponding python bindings:

class PyAdder : public Adder {
public:
    void operator()(const Data& first, const Data& second, const DataVisitor& visitor) const override {
        PYBIND11_OVERRIDE_PURE_NAME(void, Adder, "__call__", operator(), first, second, visitor);
    }
};

class PyData : public Data {
public:
    PyData() = default;
};

PYBIND11_MODULE(module, m) {
    pybind11::class_<Adder, PyAdder>(m, "Adder")
        .def(pybind11::init<>())
        .def("__call__", &Adder::operator());

    m.def("add", &add);
    m.def("add3", &add3);

    pybind11::class_<Data, PyData>(m, "Data")
        .def(pybind11::init<>());
}

Finally, let's test it in Python. Firs,t some boilerplate:

class Adder(module.Adder):
    def __call__(self, first, second, visitor):
        visitor(Data(first.value + second.value))

class Data(module.Data):
    def __init__(self, value):
        super().__init__()
        self.value = value

def print_result(data):
        print(data.value)

Testing function add yields the expected result

module.add(Data(1), Data(2), Adder(), print_result) # prints 3

However, testing add3 causes failure

module.add3(Data(1), Data(2), Data(3), Adder(), print_result) # raises RuntimeError

The error reads

Traceback (most recent call last):
  File "*****/test_add.py", line 39, in test_add3_int
    module.add3(Data(1), Data(2), Data(3), Adder(), print_result)
  File "*****/test_add.py", line 8, in __call__
    visitor(Data(first.value + second.value))
RuntimeError: Tried to call pure virtual function "Adder::__call__"

Chaising the problem through pybind11 sources, I found a possible culprit. In function get_type_override, we check if dispatch code is invoked from overridden function.

    /* Don't call dispatch code if invoked from overridden function.
       Unfortunately this doesn't work on PyPy. */
#if !defined(PYPY_VERSION)
    PyFrameObject *frame = PyThreadState_Get()->frame;
    if (frame != nullptr && (std::string) str(frame->f_code->co_name) == name
        && frame->f_code->co_argcount > 0) {
        PyFrame_FastToLocals(frame);
        PyObject *self_caller = dict_getitem(
            frame->f_locals, PyTuple_GET_ITEM(frame->f_code->co_varnames, 0));
        if (self_caller == self.ptr())
            return function();
    }
#else

It returns an empty function if the second override is called from the same frame as the first override. Commenting this check renders a working example. I consider this as a bug, but please let consider proving me wrong.

My questions are:

  • In which case it is important to prevent the second dispatch from overridden function?
  • Is there a good way to fix things related to Visitor (recursive) or Double Dispatch patterns?

The link for reproducible and extended examples: https://bitbucket.org/yershov/recursive-dispatch/src/main/

Reproducible example code

https://bitbucket.org/yershov/recursive-dispatch/src/main/
@dyershov dyershov added the triage New bug, unverified label Oct 12, 2021
@Skylion007
Copy link
Collaborator

So this is check was to fix: #159 @dyershov , does this problem return when the check is removed? I am not sure if we have a test to catch the issue, but we should.

@Skylion007
Copy link
Collaborator

Here is the commit that introduced it: f5c154a

@Skylion007
Copy link
Collaborator

Update: We do have a unit test for it:

"""#159: virtual function dispatch has problems with similar-named functions"""

@dyershov
Copy link
Contributor Author

@Skylion007 Thanks for the pointers! It looks like this commit fixes the issue of similar names in two different classes. However, the code did return an empty function object when invoked from the method within the same object even prior to this commit. What would be the reason to return an empty function if the caller has the same caller name? Effectively, why the recursion in the dispatch is not allowed:
f5c154a#diff-a0c5016f2131f063e09d9d4b683308de7838c215886201854ed8225af9de28ceR19

@dyershov
Copy link
Contributor Author

After some digging, I discovered that adding just one extra frame to CPython call stack solves the problem. For example, the following indirect call to visitor makes the examples above work!

 class Adder(module.Adder):
    def __call__(self, first, second, visitor):
        (lambda : visitor(Data(first.value + second.value)))()

Now, the question is, is it possible to add a new stack frame to the current CPython thread, when binding to DataVisitor function?

Apologies, I'm not too familiar with CPython API to be able to figure it out myself and submit a patch.

@rwgk
Copy link
Collaborator

rwgk commented Oct 13, 2021

For example, the following indirect call to visitor makes the examples above work!

That's an interesting clue.

I'm not sure when we will have a volunteer to work on fixing this. Pragmatic suggestion:

  • Create a PR adding your reproducer to tests/test_virtual_functions (ideally with somewhat meaningful variable names), but including your workaround with the extra lambda, and a comment to explain: "If this workaround is removed it reproduces [BUG]: Recursive dispatch fails to find python function override.  #3357".
  • We can then merge that.
  • After the reproducer is merged, update this PR to point to the source code comment.

Then at least we have an easy starting point for someone to pick up debugging and fixing.

(BTW: is the "pthin" a typo? Would be good to fix, or explain, or maybe you mean "Python thin" and you could just spell it out?)

P.S.: Please tag me on the PR, I'll then look quickly.

@dyershov dyershov changed the title [BUG]: Recursive dispatch fails to find pthin function override. [BUG]: Recursive dispatch fails to find python function override. Oct 13, 2021
@dyershov
Copy link
Contributor Author

@rwgk Thank you for suggestion! I'll do that, and hopefully it'll be a good guide for fixing the issue.

@Skylion007 Skylion007 added bug and removed triage New bug, unverified labels Oct 14, 2021
@Skylion007
Copy link
Collaborator

@wjakob Do you know why this code snippit is necessary and how we can restrict this check to handle this edge case?

@Skylion007
Copy link
Collaborator

"IRC this was to address a serious bug (infinite recursion -> stack overflow) that could occur when those extra checks were removed. There wasn’t a straightforward way to fix this without massively changing the implementation." from @wjakob

@Skylion007
Copy link
Collaborator

There is a related dispatch issue preventing full 3.11 support in #3419, we may need to rework this code section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants