Skip to content

Passing reference arguments to trampoline methods [smart_holder] #2916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: archive/smart_holder
Choose a base branch
from

Conversation

rhaschke
Copy link
Contributor

This replaces #2911 and mimics #2915 in master.

This introduces a set of new unittests revealing that reference arguments passed to trampoline methods,
i.e. Python-overridden virtual methods of C++ classes, are not actually passed by reference, but by copy.
This is due to the fact that return_value_policy::copy is used by default to pass from C++ to Python.
Using return_value_policy::reference for calls from those trampoline methods essentially solves the issue.

@rhaschke rhaschke force-pushed the sh-virtual-method-reference-args branch 3 times, most recently from b33bd59 to 03cd9d6 Compare March 23, 2021 09:37
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Robert, this looks great to me. My main question: is the change to include/pybind11/pytypes.h essential? Currently the smart_holder branch still uses the original from master.

@rhaschke
Copy link
Contributor Author

Yes, it is. See #2915 where I request this change in master.

@rhaschke rhaschke force-pushed the sh-virtual-method-reference-args branch from 03cd9d6 to 5583036 Compare March 24, 2021 12:01
@rhaschke
Copy link
Contributor Author

@rwgk, I think I have addressed all your feedback

@rwgk
Copy link
Collaborator

rwgk commented Mar 24, 2021

Thanks Robert! I'll run this through our global testing system.

@rwgk
Copy link
Collaborator

rwgk commented Mar 25, 2021

I'm seeing ~10 test failures in our global testing. It is running a very large number of tests, i.e. this is a very rare failure, but I verified manually that it is reproducible and also generates a stack-use-after-return failure when running with ASAN. One of the crashes involves this open-source code:

https://github.com/google/mediapipe/blob/a92cff7a60031f5c3097b06e74416732d85b5011/mediapipe/python/pybind/packet_getter.cc#L352

  m->def(
      "_get_proto_type_name",
      [](const Packet& packet) {
        return packet.GetProtoMessageLite().GetTypeName();
      },
      py::return_value_policy::move);

Drilling down, the signatures are:

const proto_ns::MessageLite& GetProtoMessageLite() const;
virtual std::string GetTypeName() const = 0;

It seems to boil down to py::return_value_policy::move used in combination with a std::string by-value return. Do you have an idea how this PR could lead to this breaking?

I'm pretty sure (but have not verified) that this code does not use smart_holder.

@rhaschke
Copy link
Contributor Author

Not sure. Need to investigate. What's the failure? stack-use-after-return only? So I will need valgrind to validate?

@rwgk
Copy link
Collaborator

rwgk commented Mar 25, 2021

Not sure. Need to investigate. What's the failure? stack-use-after-return only? So I will need valgrind to validate?

FYI: After I saw your message I tried to put together a minimal reproducer based on the string-return-by-value-combined-with-policy_move suspicion, but that works just fine. I started drilling down into the original failure (segfault without ASAN, stack-use-after-return with ASAN) and it's looking complicated ... I'll try to drill down more.

@rwgk
Copy link
Collaborator

rwgk commented Mar 25, 2021

Not sure. Need to investigate. What's the failure? stack-use-after-return only? So I will need valgrind to validate?

FYI: After I saw your message I tried to put together a minimal reproducer based on the string-return-by-value-combined-with-policy_move suspicion, but that works just fine. I started drilling down into the original failure (segfault without ASAN, stack-use-after-return with ASAN) and it's looking complicated ... I'll try to drill down more.

Sharing the next observation:

  • The test works with the smart_holder branch as-is.
  • But simply applying the change to pytypes.h breaks it.
-    template <return_value_policy policy = return_value_policy::automatic_reference, typename... Args>
+    template <return_value_policy policy = return_value_policy::reference, typename... Args>
     object operator()(Args &&...args) const;                                   
-    template <return_value_policy policy = return_value_policy::automatic_reference, typename... Args>
+    template <return_value_policy policy = return_value_policy::reference, typename... Args>

I'm still (very) unclear how that change leads to the breakage.

@rwgk
Copy link
Collaborator

rwgk commented Mar 25, 2021

Below is the documentation for return_value_policy::reference and return_value_policy::automatic_reference. It seems to me that the failing test must be connected to a difference in behavior for non-pointer returns.

@rhaschke, do you have an idea for a minimal reproducer based on that theory? The failing test is very complex, I'm hoping you have an idea that saves me the effort trying to understand and reduce it.

+--------------------------------------------------+----------------------------------------------------------------------------+
| :enum:`return_value_policy::reference`           | Reference an existing object, but do not take ownership. The C++ side is   |
|                                                  | responsible for managing the object's lifetime and deallocating it when    |
|                                                  | it is no longer used. Warning: undefined behavior will ensue when the C++  |
|                                                  | side deletes an object that is still referenced and used by Python.        |
+--------------------------------------------------+----------------------------------------------------------------------------+
| :enum:`return_value_policy::automatic`           | **Default policy.** This policy falls back to the policy                   |
|                                                  | :enum:`return_value_policy::take_ownership` when the return value is a     |
|                                                  | pointer. Otherwise, it uses :enum:`return_value_policy::move` or           |
|                                                  | :enum:`return_value_policy::copy` for rvalue and lvalue references,        |
|                                                  | respectively. See above for a description of what all of these different   |
|                                                  | policies do.                                                               |
+--------------------------------------------------+----------------------------------------------------------------------------+
| :enum:`return_value_policy::automatic_reference` | As above, but use policy :enum:`return_value_policy::reference` when the   |
|                                                  | return value is a pointer. This is the default conversion policy for       |
|                                                  | function arguments when calling Python functions manually from C++ code    |
|                                                  | (i.e. via handle::operator()). You probably won't need to use this.        |
+--------------------------------------------------+----------------------------------------------------------------------------+

@rhaschke
Copy link
Contributor Author

I think the doc simply states that also pointers converted by reference, i.e. not taking ownership. The key difference in behavior is that everything is converted as reference (instead of copy for non-pointers).
Do you have a stack trace of the failing test? Particularly, do you know from which piece of code the invalidated stack is accessed?

@rwgk
Copy link
Collaborator

rwgk commented Mar 25, 2021

I think the doc simply states that also pointers converted by reference, i.e. not taking ownership. The key difference in behavior is that everything is converted as reference (instead of copy for non-pointers).
Do you have a stack trace of the failing test? Particularly, do you know from which piece of code the invalidated stack is accessed?

I looked at the stack trace and code for almost two hours but I'm still confused, it's a very complex setup.
When running with ASAN a certain pointer is a nullptr.
When running without ASAN it's not a nullptr, but dereferencing the pointer leads to a segfault, which means it must be garbage.
It could take a long time (hours) to reduce. I'll try to find time later, but if you have any idea for a unit test that is sensitive to the behavior change caused by the pytypes.h code change, that may help a lot.

@rhaschke
Copy link
Contributor Author

rhaschke commented Mar 25, 2021

I don't think strings are affected by the PR at all: The corresponding caster ignores the return_value_policy and always copies:

static handle cast(const StringType &src, return_value_policy /* policy */, handle /* parent */) {
const char *buffer = reinterpret_cast<const char *>(src.data());
auto nbytes = ssize_t(src.size() * sizeof(CharT));
handle s = decode_utfN(buffer, nbytes);
if (!s) throw error_already_set();
return s;
}

@rhaschke
Copy link
Contributor Author

Ah. I just noticed that GetTypeName() is a virtual (even abstract) method, which is probably then implemented in Python.
I will have a look into that direction now.

@rhaschke
Copy link
Contributor Author

I couldn't find an issue when tracing the call hierarchy of this example:

struct StringReturner {
    virtual std::string get_string() { return std::string("foo"); }
};

py::classh<StringReturner, PyStringReturner>(m, "StringReturner")
	.def(py::init<>())
	.def("get_string", &StringReturner::get_string);

m.def("check_string", [](StringReturner &caller) {
	return caller.get_string();
}, py::return_value_policy::move);
class PyStringReturner(m.StringReturner):
    def get_string(self):
        return "bar"

Returning a non-const reference from C++ to Python, shouldn't create a new copy.
However, by design, the user explicitly needs to opt-in for reference-passing
by providing return_value_policy::reference.
…ned back?

This reveals that passing a unique_ptr const-reference from python to C++ doesn't work.
unique_ptrs can only be moved from Python to C++ (requiring that Python actually owns the object).

The tests also reveal that passing a unique_ptr non-const reference from C++ to Python is possible.
However, this is bad programming style (it's not clear whether ownership is transferred or not).
Shouldn't we suppress this?

Removed redundant test_unique_ptr_cref_roundtrip()
Renamed uconsumer -> consumer
Add tests to ensure that reference arguments passed to trampoline methods
(from C++ -> Python -> C++) are actually passed by reference (and not copied),
which is the default return_value_policy to pass from C++ to Python.

Being able to pass by reference is essential to allow object modifications
performed in Python code to become visible in C++.
… D>>

- forbid to pass a non-const unique_ptr reference
- forbid return_value_policy::reference_internal for unique_ptr&&:
  It's always moved!
@rhaschke rhaschke force-pushed the sh-virtual-method-reference-args branch from 9d4f1d7 to f2d285a Compare March 26, 2021 18:34
…ence

Otherwise, modifications applied by Python-coded method overrides
would be applied to copies, even though the parameters were passed
by pointer or reference.
@rhaschke rhaschke force-pushed the sh-virtual-method-reference-args branch from f2d285a to 1691a84 Compare March 29, 2021 14:58
@EricCousineau-TRI EricCousineau-TRI added the smart holder See: https://github.com/pybind/pybind11/blob/smart_holder/README_smart_holder.rst label Apr 26, 2021
@rwgk
Copy link
Collaborator

rwgk commented Jul 3, 2021

This may be affected by the whitespace changes under PR #3073. Sorry for the trouble. I looked into rebasing but it's not easy even without the whitespace changes (I think). There are merge conflicts in these two files:

tests/test_class_sh_basic.cpp
tests/test_class_sh_trampoline_basic.cpp

@rwgk rwgk force-pushed the smart_holder branch 2 times, most recently from 55d9281 to 68a11bb Compare March 5, 2025 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
smart holder See: https://github.com/pybind/pybind11/blob/smart_holder/README_smart_holder.rst
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants