Add the buffer interface for wrapped STL vectors #488

patstew · 2016-11-08T14:49:34Z

Allows use of vectors as python buffers, so for example they can be adopted without a copy by numpy.asarray
Allows faster conversion of numeric buffers to vectors with memcpy instead of individually casting the elements

aldanor · 2016-11-08T15:20:31Z

FYI: note that you can already wrap vectors as numpy arrays without copying, kind of like so:

struct Data {
    std::vector<int> vec;
};

py::class_<Data>(m, "Data")
    .def("vec", [](Data& self) -> py::array {  // no copying
        return {self.vec.data(), self.vec.size(), py::cast(self);
    });

Untested, but something like this should work.

patstew · 2016-11-08T17:00:43Z

That's cool, I didn't know that. But going that way would require me to write a load of little wrapping lambdas for my functions, or change my C++ API to return those proxy objects. This way seems easier to me, and the buffer interface has more general use without necessarily using numpy.

wjakob · 2016-11-13T00:17:52Z

Neat!

@aldanor: the advantage of this version is that the original instance exposes all the necessary interfaces. (i.e. no need to call an extra function like .vec()).

The restriction to arithmetic-only types if of course the main disadvantage - @aldanor's NumPy interface can handle much fancier structured data types.

I wonder if there is a way to expose a NumPy interface (analogous to the buffer interface) that works for more complex structured types, and without calling an extra conversion function?

patstew · 2016-11-13T01:28:10Z

I guess it would be possible to write some template machinery to let you explicitly declare members of a struct, or use a std::tuple, and match the types to the python format. But I'd guess that you'd run into problems with C++ and python disagreeing about alignment in those cases? I guess it'd be OK for a Point { float x, float y } sort of thing, but once you start mixing types is the memory layout guaranteed?

aldanor · 2016-11-15T12:08:18Z

@patstew That's exactly what "record array" part of <pybind11/numpy.h> does :) E.g. figures out the exact offsets, alignment, supports nested structs, etc, but you have to call a macro and pass it the field names. A fairly good demo of supported types is in the tests.

Why the "is_arithmetic" restriction? I don't why this would work with structured dtypes. PYBIND11_NUMPY_DTYPE registeres both npy_format_descriptor<> and format_descriptor<>, so it should work in this case just as well?

Re: "not bool", this bit looks a tad flaky; I guess what you really want to check is that the container type is not a proxy and underlying memory is contiguous. Simply checking that .data() returns a pointer is sufficient for STL containers, I think (vector<bool> doesn't have it in the specialization).

patstew · 2016-11-15T21:22:35Z

There we go, it now goes to and from structured arrays too. I've stopped pybind11 from declaring aligned types as unaligned in the format string and added a function to compare format strings, so that I can tell when a numpy structured dtype and a c++ one are the same, even when the format strings are not literally the same. numpy tends to declare padding as 'xxx' rather than '3x' and doesn't add trailing padding to the format string. At least for me on windows, numpy and c++ don't agree on whether int32 is int or long so I allow 'i' == 'l' when appropriate. I'm also ignoring the field names.

aldanor · 2016-11-16T10:14:01Z

Ok, so it looks like this is now 2 (3?) separate issues, and as such it may even make sense to break it down into separate PRs.

Recording alignment of fields for structured dtypes. Makes sense to have -- however it would be also nice to add a test that actually fails without this (i.e. at the current master branch) at least on one platform. (We also ignore byte ordering at the moment btw)
The actual vector/buffer interface

Re: compare_format_descriptor(), congrats, you've started partially reimplementing functionality from numpy.core._internal :) While your code is likely to be correct (not clear; needs extensive tests if this is to be merged in), tbh I wouldn't be a huge fan of maintaining it. Buffer format strings are horrible, and there's always a myriad of weird corner cases.

One alternative option would be to rely on numpy for comparing structured types (you have to register them through numpy anyway, so this makes sense). You'll still need special-casing even for arithmetic types, but now that would be a lot simpler. Another option is to use numpy dtypes for comparing all underlying value types; this imposes reliance on numpy but would be a bit cleaner.

There's already a py::dtype(const buffer_info&) ctor, and there's py::dtype::of<T>() for all registered types. Dtypes can then be compared for equivalence (not equality), which should handle the corner cases like sizeof(int) == sizeof(long) on Windows platforms (will still need tests).

aldanor · 2016-11-16T10:16:06Z

@wjakob @patstew

@aldanor: the advantage of this version is that the original instance exposes all the necessary interfaces. (i.e. no need to call an extra function like .vec()).

Well, there's NumPy array interface which is very simple to implement and doesn't require dancing around Python buffer protocol and its format strings. Then the original instance would also "expose all the necessary interfaces" as you could just np.asarray(vec), or even just pass it in places where numpy array is expected.

patstew · 2016-11-16T13:25:13Z

Ok, I've cut out some of the extra stuff for this PR. It basically means that the fast init path won't work as often, but exposing vectors works fine. I've added some non-numpy tests too.
My original reason for adding this was so that passing around big vector<char>s (from file/network/etc) wasn't doing quite so much unnecessary work. The numpy aspect was a nice to have, and handy for testing across py2/py3. I'd prefer not to tie the whole thing to numpy.
Perhaps we could have a simple compare_format_descriptor() (perhaps just string ==) and overload it if py::dtype::of<T> exists?
I've split the unaliged buffer bit into a separate PR, and can do a new PR for the compare_format_descriptor() later with some guidance on what form is preferable.

aldanor · 2016-11-21T11:35:03Z

@patstew So, it looks like the cases are:

T is a non-structured type. In this case you can compare format strings, and even handle some platform-dependent weirdness manually (like i and l on Windows). Maybe it'd make sense to have a normalize_scalar_format() function that would handle all that instead of a comparison function? With a bit of effort, you could probably even make checks like sizeof(int) == sizeof(long) compile-time.
T is a structured type (basically, format string is not a single character, or it contains '{'?). Currently, you can only register those via PYBIND11_NUMPY_DTYPE, so we know that NumPy is present, and we can compare dtypes for equivalence, namely py::dtype::of<T>() and dtype constructed from buffer info object.

Note that you could still use NumPy dtype comparison in (1), but if comparing format strings for simple scalars is sufficient then it's fine.

aldanor · 2016-11-21T12:02:19Z

Here's another thought, maybe there could be a separate type for format strings, e.g. buffer_format, implicitly convertible from/to std::string so as not to break existing code, and implementing comparison op following the logic above. This way it could be more reusable, since it's not stl-vector-specific inherently.

patstew · 2016-11-21T17:26:24Z

Using the numpy comparison function seems to fail when there's trailing padding required, because when you create a dtype from the buffer string numpy returns there is no trailing padding, so EquivTypes returns false. I suspect it's to do with this numpy/numpy#7798
Code with failing test here: https://github.com/patstew/pybind11/tree/compare_buffer_numpy

I still think this PR is useful on its own. Are there any issues with it? Making it accept more struct types can be added as a new PR.

aldanor · 2016-11-21T18:12:28Z

@patstew Yes, you can notice my comments in numpy/numpy#7797 and numpy/numpy#7798.

This is why we have these bits of code in pybind11:

aldanor · 2016-11-21T18:14:02Z

(as in, if the padding doesn't get stripped properly by py::dtype, it's a bug we have to fix regardless; I'll try to take a look at that branch)

wjakob · 2016-11-22T11:24:26Z

include/pybind11/stl_bind.h

@@ -326,6 +327,36 @@ template <typename Vector, typename Class_> auto vector_if_insertion_operator(Cl
    );
 }

+// Provide the buffer interface for vectors if we have data() and we have a format for it
+// GCC seems to have "void std::vector<bool>::data()" - doing SFINAE on the existence of data() is insufficient, we need to check it doesn't return void


Not sure I understand this? Are we treating GCC specially here?

As far as I can tell just doing SFINAE with decltype(std::declval<Vector>().data()) does what I expect on MSVC, but it gets enabled for vector<bool> on GCC (which shouldn't have a data member). !is_same_v<decltype(...), void> works for both, which suggests to me that GCC sees void data(), though I may be misunderstanding.

If the goal is just to detect the bool special case, wouldn't it be more transparent to SFINAE on that?

I did that originally, aldanor said he'd prefer checking for data().

cc @aldanor

Yes, GCC has void std::vector<bool>::data(). If vector's data() returns a pointer type, it's not a proxy, I guess this was my initial point. But if the consensus is to just check for bool so the sfinae is simpler so be it....

Instead of checking that it isn't void, why don't you check that it is a Vector::value_type *? (Ultimately that seems to align better with what actually matters in the code)

There's nothing, in theory, to prevent this from working with some std::vector-like custom class (just like bind_vector), so I think the check is worthwhile, but it should really just check for what we want instead of checking for not being what the current stdlibc++ implementation does.

wjakob · 2016-11-22T11:25:12Z

include/pybind11/stl_bind.h

+
+    try {
+        //numpy.h declares this for arbitrary types, but it may raise an exception if PYBIND11_NUMPY_DTYPE hasn't been called
+        py::format_descriptor<T>::format();


So what happens if stl_bind.h is included but not numpy.h?

It works for the ordinary arithmetic types declared in common.h, or fails to compile because the default implementation of py::format_descriptor is empty. numpy.h provides a specialisation for all POD types, which may throw at runtime if the type isn't declared properly.
The py::format_descriptor<typename Vector::value_type>::format() SFINAE bit means it never hits the 'fail to compile' case.

Ok -- this sounds good to me then.

wjakob · 2016-11-22T12:44:49Z

Note: this PR is currently conflicted and needs to be rebased on top of master.

patstew · 2016-11-22T13:39:09Z

I've fixed the issue with padding in that branch (https://github.com/patstew/pybind11/tree/compare_buffer_numpy) now. I'll submit it as a new PR once this has gone in, unless you'd like me to add it to this PR. I suspect it warrants its own discussion though.

aldanor · 2016-11-22T14:29:24Z

@patstew Yes, the strip_padding fix/hack looks correct to me; I actually fixed it almost exactly the same way locally while looking at why your branch fails.

Would you mind opening a separate PR just for that first? (+ a test in test_numpy_dtypes that would be broken on the current master but would get fixed with the strip_padding fix)

patstew · 2016-11-22T14:59:04Z

Appveyor seems to have taken an hour and not actually started building at all...

jagerman · 2017-01-17T22:12:37Z

No, it's more of a you-'re-on-your-own as far as lifetime management, essentially the same thing you get with rvp::reference.

patstew · 2017-02-07T15:28:52Z

This was broken by the addition of py::buffer_protocol(), I've fixed that and a merge conflict. I'm not sure why AppVeyor is refusing to build it?

dean0x7d · 2017-02-08T01:36:32Z

The CI hooks may not have triggered correctly. Try pushing a new commit and see if it recovers.

patstew · 2017-02-15T11:24:35Z

@wjakob @aldanor @jagerman Are there any outstanding issues with this?

jagerman · 2017-02-15T17:31:16Z

@patstew: the PR currently disables (comments out) most of the travis-ci builds, which obviously need to be undone.

jagerman · 2017-02-15T18:27:55Z

include/pybind11/stl_bind.h

+        if (info.ndim != 1)
+            throw pybind11::type_error("Only 1D buffers can be copied to a vector");
+        if (info.strides[0] != sizeof(T))
+            throw pybind11::type_error("Item size mismatch (Python: " + std::to_string(info.strides[0]) + " C++: " + std::to_string(sizeof(T)) + ")");


I don't think this strides[0] == sizeof(T) requirement is desirable—it seems only to be a requirement of the vector constructor you are using, not of the type itself. It should be easy enough to support arbitrary strides, which let slices (e.g. single rows/columns of numpy matrices) work as vector buffer inputs. Something like this (untested) as a replacement for the Vector constructor ought to work:

new (&vec) Vector(); vec.reserve(info.shape[0]); for (void *ptr = info.ptr, void *end = info.ptr + info.strides[0] * info.shape[0]; ptr != end; ptr += info.strides[0]) vec.push_back(*static_cast<T *>(ptr));

(A small test to see that this really does work would be good as well).

Added with test

jagerman · 2017-02-17T04:06:28Z

tests/test_stl_binders.py

+    m[2] = 5
+    assert v[2] == 5
+
+    v = VectorInt(a[:, 1])


wjakob · 2017-02-26T22:42:26Z

I took a brief look at the latest iteration of this PR. I realize that it has been in the works for quite a while. Right now I'm still not 100% excited about it, for these reasons:

It's a quite heavy addition to add the buffer interface to every std::vector binding. I'm concerned about codebases which use many different kinds of vectors and potentially don't even need this feature. It would be good if this is opt-in.
Some parts of this patch could be implemented more elegantly building on top of the existing array_t<>.I'm talking about stuff like

for (char *p = static_cast<char*>(info.ptr), *end = static_cast<char*>(info.ptr) + info.shape[0] * info.strides[0]; p < end; p += info.strides[0])
            vec.push_back(*reinterpret_cast<T*>(p));

In practice, I suppose that non-trivial types will require NumPy support in any case.

Thoughts?

patstew · 2017-02-27T01:12:11Z

I've made it conditional on the existing py::buffer_protocol() option.
I've only just added that particular monstrosity to deal with strided arrays as @jagerman requested. I'd prefer not to make this wholly tied to numpy or array_t, my original motivation was that passing vectors of several MBs for IO was taking a noticeable amount of time going through all the cast functions. The more advanced numpy structs and strides stuff is there because you guys requested it :).

jagerman · 2017-02-27T01:17:40Z

That monstrosity could probably be made much more readable if the *end = static_cast<char*>(info.ptr) + info.shape[0] * info.strides[0] was before the for loop instead of as a double initializer. (I take the blame; that was copied directly from my suggestion).

Allows use of vectors as python buffers, so for example they can be adopted without a copy by numpy.asarray Allows faster conversion of buffers to vectors by copying instead of individually casting the elements

Allows equivalent integral types and numpy dtypes

patstew · 2017-02-27T02:03:03Z

I've got rid of the reinterpret_cast too, by assuming (and checking) that strides[0] is a multiple of itemsize (Python docs say it should be) and itemsize == sizeof(T) (which it should if the format matched).

dean0x7d · 2017-03-13T21:22:10Z

This seems like a very useful feature and it's been cooking for a while. Are there any blockers left?

wjakob · 2017-03-14T01:49:57Z

No, I think it looks great now! Thank you @patstew!

wjakob · 2017-03-14T01:52:21Z

(I've added you to the acknowledgements in README.md)

patstew force-pushed the master branch 3 times, most recently from e8da38b to de72c33 Compare November 8, 2016 16:38

patstew force-pushed the master branch from de72c33 to 7b2f651 Compare November 15, 2016 21:01

patstew force-pushed the master branch 2 times, most recently from 5dce001 to a8e1fb7 Compare November 16, 2016 03:33

patstew force-pushed the master branch from a8e1fb7 to b4f065b Compare November 16, 2016 13:12

patstew mentioned this pull request Nov 16, 2016

Only mark unaligned types in buffers #505

Merged

wjakob reviewed Nov 22, 2016

View reviewed changes

patstew force-pushed the master branch 2 times, most recently from 870dca5 to 6ac109f Compare November 22, 2016 13:32

patstew force-pushed the master branch from 8a3d71e to 3bf0bbf Compare February 7, 2017 15:24

patstew force-pushed the master branch 3 times, most recently from 7aab954 to c49459e Compare February 9, 2017 19:31

jagerman reviewed Feb 15, 2017

View reviewed changes

patstew force-pushed the master branch 4 times, most recently from d5e2839 to 2930aeb Compare February 16, 2017 12:27

jagerman reviewed Feb 17, 2017

View reviewed changes

tests/test_stl_binders.py

m[2] = 5

assert v[2] == 5

v = VectorInt(a[:, 1])

Copy link

Member

jagerman Feb 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

patstew force-pushed the master branch 2 times, most recently from 7b51cc7 to 736a59f Compare February 27, 2017 00:57

patstew added 2 commits February 27, 2017 01:56

Add the buffer interface for wrapped STL vectors

aaa35d7

Allows use of vectors as python buffers, so for example they can be adopted without a copy by numpy.asarray Allows faster conversion of buffers to vectors by copying instead of individually casting the elements

Add function for comparing buffer_info formats to types

e985760

Allows equivalent integral types and numpy dtypes

patstew force-pushed the master branch from 736a59f to e985760 Compare February 27, 2017 01:57

dean0x7d mentioned this pull request Mar 13, 2017

Make a v2.1 release? #726

Closed

wjakob merged commit 0b6d08a into pybind:master Mar 14, 2017

rwgk mentioned this pull request Feb 9, 2023

FWD pybind11 google/pybind11clif#488

Closed

Add the buffer interface for wrapped STL vectors #488

Add the buffer interface for wrapped STL vectors #488

Conversation

patstew commented Nov 8, 2016

aldanor commented Nov 8, 2016

patstew commented Nov 8, 2016

wjakob commented Nov 13, 2016

patstew commented Nov 13, 2016 • edited Loading

aldanor commented Nov 15, 2016

patstew commented Nov 15, 2016 • edited Loading

aldanor commented Nov 16, 2016

aldanor commented Nov 16, 2016 • edited Loading

patstew commented Nov 16, 2016 • edited Loading

aldanor commented Nov 21, 2016 • edited Loading

aldanor commented Nov 21, 2016

patstew commented Nov 21, 2016 • edited Loading

aldanor commented Nov 21, 2016

aldanor commented Nov 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patstew Nov 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjakob commented Nov 22, 2016

patstew commented Nov 22, 2016 • edited Loading

aldanor commented Nov 22, 2016 • edited Loading

patstew commented Nov 22, 2016

jagerman commented Jan 17, 2017

patstew commented Feb 7, 2017

dean0x7d commented Feb 8, 2017

patstew commented Feb 15, 2017

jagerman commented Feb 15, 2017

jagerman Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjakob commented Feb 26, 2017 • edited Loading

patstew commented Feb 27, 2017

jagerman commented Feb 27, 2017

patstew commented Feb 27, 2017

dean0x7d commented Mar 13, 2017

wjakob commented Mar 14, 2017

wjakob commented Mar 14, 2017

patstew commented Nov 13, 2016 •

edited

Loading

patstew commented Nov 15, 2016 •

edited

Loading

aldanor commented Nov 16, 2016 •

edited

Loading

patstew commented Nov 16, 2016 •

edited

Loading

aldanor commented Nov 21, 2016 •

edited

Loading

patstew commented Nov 21, 2016 •

edited

Loading

patstew Nov 22, 2016 •

edited

Loading

patstew commented Nov 22, 2016 •

edited

Loading

aldanor commented Nov 22, 2016 •

edited

Loading

jagerman Feb 15, 2017 •

edited

Loading

wjakob commented Feb 26, 2017 •

edited

Loading