-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add the buffer interface for wrapped STL vectors #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI: note that you can already wrap vectors as numpy arrays without copying, kind of like so: struct Data {
std::vector<int> vec;
};
py::class_<Data>(m, "Data")
.def("vec", [](Data& self) -> py::array { // no copying
return {self.vec.data(), self.vec.size(), py::cast(self);
}); Untested, but something like this should work. |
e8da38b
to
de72c33
Compare
That's cool, I didn't know that. But going that way would require me to write a load of little wrapping lambdas for my functions, or change my C++ API to return those proxy objects. This way seems easier to me, and the buffer interface has more general use without necessarily using numpy. |
Neat! @aldanor: the advantage of this version is that the original instance exposes all the necessary interfaces. (i.e. no need to call an extra function like The restriction to arithmetic-only types if of course the main disadvantage - @aldanor's NumPy interface can handle much fancier structured data types. I wonder if there is a way to expose a NumPy interface (analogous to the buffer interface) that works for more complex structured types, and without calling an extra conversion function? |
I guess it would be possible to write some template machinery to let you explicitly declare members of a struct, or use a std::tuple, and match the types to the python format. But I'd guess that you'd run into problems with C++ and python disagreeing about alignment in those cases? I guess it'd be OK for a |
@patstew That's exactly what "record array" part of Why the "is_arithmetic" restriction? I don't why this would work with structured dtypes. Re: "not bool", this bit looks a tad flaky; I guess what you really want to check is that the container type is not a proxy and underlying memory is contiguous. Simply checking that |
There we go, it now goes to and from structured arrays too. I've stopped pybind11 from declaring aligned types as unaligned in the format string and added a function to compare format strings, so that I can tell when a numpy structured dtype and a c++ one are the same, even when the format strings are not literally the same. numpy tends to declare padding as 'xxx' rather than '3x' and doesn't add trailing padding to the format string. At least for me on windows, numpy and c++ don't agree on whether int32 is int or long so I allow 'i' == 'l' when appropriate. I'm also ignoring the field names. |
5dce001
to
a8e1fb7
Compare
Ok, so it looks like this is now 2 (3?) separate issues, and as such it may even make sense to break it down into separate PRs.
Re: One alternative option would be to rely on numpy for comparing structured types (you have to register them through numpy anyway, so this makes sense). You'll still need special-casing even for arithmetic types, but now that would be a lot simpler. Another option is to use numpy dtypes for comparing all underlying value types; this imposes reliance on numpy but would be a bit cleaner. There's already a |
Well, there's NumPy array interface which is very simple to implement and doesn't require dancing around Python buffer protocol and its format strings. Then the original instance would also "expose all the necessary interfaces" as you could just |
Ok, I've cut out some of the extra stuff for this PR. It basically means that the fast init path won't work as often, but exposing vectors works fine. I've added some non-numpy tests too. |
@patstew So, it looks like the cases are:
Note that you could still use NumPy dtype comparison in (1), but if comparing format strings for simple scalars is sufficient then it's fine. |
Here's another thought, maybe there could be a separate type for format strings, e.g. |
Using the numpy comparison function seems to fail when there's trailing padding required, because when you create a dtype from the buffer string numpy returns there is no trailing padding, so EquivTypes returns false. I suspect it's to do with this numpy/numpy#7798 I still think this PR is useful on its own. Are there any issues with it? Making it accept more struct types can be added as a new PR. |
@patstew Yes, you can notice my comments in numpy/numpy#7797 and numpy/numpy#7798. This is why we have these bits of code in pybind11: |
(as in, if the padding doesn't get stripped properly by |
include/pybind11/stl_bind.h
Outdated
@@ -326,6 +327,36 @@ template <typename Vector, typename Class_> auto vector_if_insertion_operator(Cl | |||
); | |||
} | |||
|
|||
// Provide the buffer interface for vectors if we have data() and we have a format for it | |||
// GCC seems to have "void std::vector<bool>::data()" - doing SFINAE on the existence of data() is insufficient, we need to check it doesn't return void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this? Are we treating GCC specially here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell just doing SFINAE with decltype(std::declval<Vector>().data())
does what I expect on MSVC, but it gets enabled for vector<bool>
on GCC (which shouldn't have a data member). !is_same_v<decltype(...), void>
works for both, which suggests to me that GCC sees void data()
, though I may be misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is just to detect the bool
special case, wouldn't it be more transparent to SFINAE on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that originally, aldanor said he'd prefer checking for data().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @aldanor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, GCC has void std::vector<bool>::data()
. If vector's data()
returns a pointer type, it's not a proxy, I guess this was my initial point. But if the consensus is to just check for bool so the sfinae is simpler so be it....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking that it isn't void, why don't you check that it is a Vector::value_type *
? (Ultimately that seems to align better with what actually matters in the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing, in theory, to prevent this from working with some std::vector-like custom class (just like bind_vector
), so I think the check is worthwhile, but it should really just check for what we want instead of checking for not being what the current stdlibc++ implementation does.
include/pybind11/stl_bind.h
Outdated
|
||
try { | ||
//numpy.h declares this for arbitrary types, but it may raise an exception if PYBIND11_NUMPY_DTYPE hasn't been called | ||
py::format_descriptor<T>::format(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens if stl_bind.h
is included but not numpy.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works for the ordinary arithmetic types declared in common.h
, or fails to compile because the default implementation of py::format_descriptor
is empty. numpy.h
provides a specialisation for all POD types, which may throw at runtime if the type isn't declared properly.
The py::format_descriptor<typename Vector::value_type>::format()
SFINAE bit means it never hits the 'fail to compile' case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok -- this sounds good to me then.
Note: this PR is currently conflicted and needs to be rebased on top of |
870dca5
to
6ac109f
Compare
I've fixed the issue with padding in that branch (https://github.com/patstew/pybind11/tree/compare_buffer_numpy) now. I'll submit it as a new PR once this has gone in, unless you'd like me to add it to this PR. I suspect it warrants its own discussion though. |
@patstew Yes, the Would you mind opening a separate PR just for that first? (+ a test in |
Appveyor seems to have taken an hour and not actually started building at all... |
No, it's more of a you-'re-on-your-own as far as lifetime management, essentially the same thing you get with |
This was broken by the addition of |
The CI hooks may not have triggered correctly. Try pushing a new commit and see if it recovers. |
7aab954
to
c49459e
Compare
@patstew: the PR currently disables (comments out) most of the travis-ci builds, which obviously need to be undone. |
include/pybind11/stl_bind.h
Outdated
if (info.ndim != 1) | ||
throw pybind11::type_error("Only 1D buffers can be copied to a vector"); | ||
if (info.strides[0] != sizeof(T)) | ||
throw pybind11::type_error("Item size mismatch (Python: " + std::to_string(info.strides[0]) + " C++: " + std::to_string(sizeof(T)) + ")"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this strides[0] == sizeof(T)
requirement is desirable—it seems only to be a requirement of the vector constructor you are using, not of the type itself. It should be easy enough to support arbitrary strides, which let slices (e.g. single rows/columns of numpy matrices) work as vector buffer inputs. Something like this (untested) as a replacement for the Vector constructor ought to work:
new (&vec) Vector();
vec.reserve(info.shape[0]);
for (void *ptr = info.ptr, void *end = info.ptr + info.strides[0] * info.shape[0]; ptr != end; ptr += info.strides[0])
vec.push_back(*static_cast<T *>(ptr));
(A small test to see that this really does work would be good as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added with test
d5e2839
to
2930aeb
Compare
m[2] = 5 | ||
assert v[2] == 5 | ||
|
||
v = VectorInt(a[:, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
I took a brief look at the latest iteration of this PR. I realize that it has been in the works for quite a while. Right now I'm still not 100% excited about it, for these reasons:
for (char *p = static_cast<char*>(info.ptr), *end = static_cast<char*>(info.ptr) + info.shape[0] * info.strides[0]; p < end; p += info.strides[0])
vec.push_back(*reinterpret_cast<T*>(p)); In practice, I suppose that non-trivial types will require NumPy support in any case. Thoughts? |
7b51cc7
to
736a59f
Compare
I've made it conditional on the existing py::buffer_protocol() option. |
That monstrosity could probably be made much more readable if the |
Allows use of vectors as python buffers, so for example they can be adopted without a copy by numpy.asarray Allows faster conversion of buffers to vectors by copying instead of individually casting the elements
Allows equivalent integral types and numpy dtypes
I've got rid of the |
This seems like a very useful feature and it's been cooking for a while. Are there any blockers left? |
No, I think it looks great now! Thank you @patstew! |
(I've added you to the acknowledgements in README.md) |
Allows use of vectors as python buffers, so for example they can be adopted without a copy by numpy.asarray
Allows faster conversion of numeric buffers to vectors with memcpy instead of individually casting the elements