Skip to content

Fix invalid access when reinterpret_casting a non-pybind11 PyObject* to instance* (found by Valgrind in #2746) #2755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

YannickJadoul
Copy link
Collaborator

Description

Carved out from #2746 to ease reviewing process.

Suggested changelog entry:

Fix invalid access when calling a pybind11 ``__init__`` on a non-pybind11 class instance.

@YannickJadoul
Copy link
Collaborator Author

@rwgk, this PR might benefit from a wider test, but since you already tested #2746, this should have already passed?

Ready for review. Given the subtlety of the actual problem, it would be nice to get a thorough review, here.

@@ -504,15 +504,15 @@ class cpp_function : public function {

auto self_value_and_holder = value_and_holder();
if (overloads->is_constructor) {
const auto tinfo = get_type_info((PyTypeObject *) overloads->scope.ptr());
const auto pi = reinterpret_cast<instance *>(parent.ptr());
Copy link
Collaborator Author

@YannickJadoul YannickJadoul Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from #2746:

  • @bstaletic: Without checking the type first, we were reinterpreting NotPybindDerived() as instance*.
  • @YannickJadoul: See also the code below that gets simplified; cfr. if (!self_value_and_holder.type || !self_value_and_holder.inst) -> pi->get_value_and_holder(tinfo, true);

Copy link
Collaborator

@bstaletic bstaletic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks good to me. Is this the same commit just cherry-picked from #2746? That would make rebasing #2746 easier.

@YannickJadoul
Copy link
Collaborator Author

Is this the same commit just cherry-picked from #2746?

I believe so. At any rate, I'll take responsibility for rebasing :-)

Copy link
Collaborator

@EricCousineau-TRI EricCousineau-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'm assuming there may be a bit of performance penalty, but nothing that can't be resolved later.

Mayhaps there's a way to fold this fail-fast more directly into get_type_info, but this is def. a strict improvement!

@YannickJadoul
Copy link
Collaborator Author

I'm assuming there may be a bit of performance penalty, but nothing that can't be resolved later.

Yes. But as fa as I know, the same penalty as for a normal cast, if it would not have been self, but one of the other arguments?

@YannickJadoul
Copy link
Collaborator Author

Thanks all. Since it was already tested in #2746, I'll merge this! :-)

@YannickJadoul YannickJadoul merged commit e612043 into pybind:master Dec 31, 2020
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Dec 31, 2020
@YannickJadoul YannickJadoul deleted the invalid-reinterpret-cast-instance branch December 31, 2020 16:12
@YannickJadoul
Copy link
Collaborator Author

@EricCousineau-TRI, there's a bit of optimization in type_caster_generic:

// Case 1: If src is an exact type match for the target type then we can reinterpret_cast
// the instance's value pointer to the target type:
if (srctype == typeinfo->type) {
this_.load_value(reinterpret_cast<instance *>(src.ptr())->get_value_and_holder());
return true;
}
// Case 2: We have a derived class
else if (PyType_IsSubtype(srctype, typeinfo->type)) {
auto &bases = all_type_info(srctype);
bool no_cpp_mi = typeinfo->simple_type;
// Case 2a: the python type is a Python-inherited derived class that inherits from just
// one simple (no MI) pybind11 class, or is an exact match, so the C++ instance is of
// the right type and we can use reinterpret_cast.
// (This is essentially the same as case 2b, but because not using multiple inheritance
// is extremely common, we handle it specially to avoid the loop iterator and type
// pointer lookup overhead)
if (bases.size() == 1 && (no_cpp_mi || bases.front()->type == typeinfo->type)) {
this_.load_value(reinterpret_cast<instance *>(src.ptr())->get_value_and_holder());
return true;
}
// Case 2b: the python type inherits from multiple C++ bases. Check the bases to see if
// we can find an exact match (or, for a simple C++ type, an inherited match); if so, we
// can safely reinterpret_cast to the relevant pointer.
else if (bases.size() > 1) {
for (auto base : bases) {
if (no_cpp_mi ? PyType_IsSubtype(base->type, typeinfo->type) : base->type == typeinfo->type) {
this_.load_value(reinterpret_cast<instance *>(src.ptr())->get_value_and_holder(base));
return true;
}
}
}
// Case 2c: C++ multiple inheritance is involved and we couldn't find an exact type match
// in the registered bases, above, so try implicit casting (needed for proper C++ casting
// when MI is involved).
if (this_.try_implicit_casts(src, convert))
return true;
}

Worth checking out, indeed. Do we create an issue for this?

@EricCousineau-TRI
Copy link
Collaborator

Nah, just speculative at this point. I think it only merits an issue if there's measurable / traceable performance penalties. (I'll keep in mind to try benchmarking older versions, say 2.3.x, to get some semblance of history, per #2760)

Also, not sure if I understand how type_caster_generic relates here as an optimization... Can you expand on that, if you think it's worth the time?

(FWIW, I don't think this is too actionable yet, so we can let sleeping dogs lie 🐶 💤)

@YannickJadoul YannickJadoul added this to the v2.6.2 milestone Jan 1, 2021
@YannickJadoul
Copy link
Collaborator Author

Also, not sure if I understand how type_caster_generic relates here as an optimization... Can you expand on that, if you think it's worth the time?

Maybe it's not, though; I saw this if (srctype == typeinfo->type) vs. else if (PyType_IsSubtype(srctype, typeinfo->type)), but PyType_IsSubtype should do that itself? I think we might just be able to ignore this, actually. Let's leave it, for now, indeed :-)

@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants