Make access unsafe #599

SylvainCorlay · 2017-01-09T09:33:45Z

This is the removal of the bound checks I am proposing in #584 .

dean0x7d · 2017-01-09T13:15:08Z

Why not use byte_offset_unsafe?

As far as I know, py::array was never meant to be a high-performance container, but just a convenient C++ interface for numpy.ndarray. When performance is required there are two options:

Take the raw data, dims, strides pointers from py::array and give it to a map/view of a proper high-performance container.
Access the data using iterators instead of indexing. This isn't very developed in py::array but it seems to be direction that xtensor-python is going in.

I think the main issue with making the default py::array unsafe is that it's surprising behavior compared to its Python counterpart. py::array isn't a C++ std container. It's not even a real container. It's a wrapper around an existing Python container which has it's own established behavior. By making the element access unsafe, it goes from a reliable exception in Python to undefined behavior in C++.

Keeping the Python behavior makes it consistent and easy to explain: "It looks like a numpy.ndarray and behaves like a numpy.ndarray". If py::array starts deviating from that, then there are additional caveats which need to be explained. This makes for a frustrating learning curve and hidden gotchas.

SylvainCorlay · 2017-01-09T13:30:46Z

(From the original issue)

I agree that when exposing a feature to Python, it must be written in such a way that regardless of the values passed to it, it cannot crash the interpreter, but these accessors are not directly exposed to the Python users. Instead, I think that one of the main reasons people would right c++ extensions with pybind is because they want speed, and they might be disappointed with this behavior. #500 already improved the situation, but as we discussed there, I would prefer to completely remove the bound checking.

wjakob · 2017-01-09T13:36:39Z

An alternative proposal would be to have two functions of each kind: data(...), mutable_data(...) , etc: checks only done in DEBUG mode. data_safe(...), mutable_data_safe(...) , etc: checks always done.

wjakob · 2017-01-09T13:36:53Z

cc @aldanor

SylvainCorlay · 2017-01-09T13:39:12Z

Safe vs unsafe access could be determined with by tag dispatching through a policy.

wjakob · 2017-01-09T13:40:42Z

Can you elaborate what this would look like?

SylvainCorlay · 2017-01-09T13:42:07Z

Arrays would have an additional template parameter whose type determines whether things are checked.

https://en.wikipedia.org/wiki/Policy-based_design

dean0x7d · 2017-01-09T14:40:01Z

But the main argument for unsafe access is that people using pybind (and not pure python) use it because they want speed.

Sure, but even with unsafe access for (int i = 0; i < size; ++i) loops aren't really guaranteed to be faster than what numpy already does. Using proper high-performance C++ libraries/containers with views to array.data/.shape/.strides is almost always going to give better results than accessing element by element.

My main worry is that py::array ends up being somewhere in limbo: neither consistent with numpy nor truly high performance (since expression templates and SIMD intrinsics are out of scope of pybind11).

But I may have misinterpreted the purpose of py::array. It's not py::ndarray, so perhaps it doesn't need to follow numpy.ndarray conventions?

SylvainCorlay · 2017-01-09T14:54:01Z

Sure, but even with unsafe access for (int i = 0; i < size; ++i) loops aren't really guaranteed to be faster than what numpy already does. Using proper high-performance C++ libraries/containers with views to array.data/.shape/.strides is almost always going to give better results than accessing element by element.

Sure, but looping is still something that people will want to do reasonably fast. Checking attributes at every iteration is a good way to kill performance.

dean0x7d · 2017-01-11T22:10:49Z

Sure, but looping is still something that people will want to do reasonably fast. Checking attributes at every iteration is a good way to kill performance.

Perhaps I'm looking at this too much from the C++ side, where we have the performance and want to extend to Python for the flexibility. So I'm mostly looking at py::array as means to get some information about a Python array or package up some data to send to Python. On the other hand, people starting out in Python are extending to C++ for the performance and looking to py::array as the closest tool. So yeah, it being slow would be a disappointment from that point of view.

In that case, the interface should also get an overhaul. Currently, the main way to access elements is .at() and .mutable_at(). We expect from std::vector that .at() is checked, so it would be better to introduce an operator() overload for indexing which would also combine the interface for reading and writing. The writeable check should then also be removed. Both functions also have dimension checks which should also be removed. This PR currently only removes the bounds check in byte_offset() which is an internal function and only part of the bottleneck for a full .at() / .mutable_at() access.

So I guess my main point is: if it's going to be unsafe, make it consistently unsafe so that there's no false sense of security. That's also required for full c-array-level access speed. (Although I'm still not convinced that unsafe-by-default should be the purpose of py::array.)

wjakob · 2017-01-13T10:19:19Z

@dean0x7d: In that case, what do you propose for the read-only version of operator()?

SylvainCorlay · 2017-01-13T10:22:08Z

Regarding the writable check, what exactly is the semantic of it in numpy?

aldanor · 2017-01-13T10:47:46Z

Regarding the writable check, what exactly is the semantic of it in numpy?

>>> import numpy as np
>>> a = np.array([0])
>>> a.flags.writeable = False
>>> a[0] = 1
ValueError: assignment destination is read-only

SylvainCorlay · 2017-01-13T10:57:37Z

Yeah since it is determined at runtime, this cannot really play well with the const semantics of c++

aldanor · 2017-01-13T10:58:40Z

Sure, but looping is still something that people will want to do reasonably fast. Checking attributes at every iteration is a good way to kill performance.

But computing byte offset from scratch on every iteration is not optimal regardless. If you have a tight loop, the chances are you'll fetch the data pointer once anyway and then increment the strides manually in the loops to avoid needless multiplications and stride fetching? NumPy C API has some macros (PyArray_ITER_*) to simplify iteration, maybe instead of removing bound checks we could provide iteration primitives in the same fashion?

SylvainCorlay · 2017-01-13T11:00:32Z

Sure, but looping is still something that people will want to do reasonably fast. Checking attributes at every iteration is a good way to kill performance.

But computing byte offset from scratch on every iteration is not optimal regardless. If you have a tight loop, the chances are you'll fetch a pointer once anyway and then increment the strides manually in the loops to avoid needless multiplications and stride fetching? NumPy C API has some macros (PyArray_ITER_*) to simplify iteration, maybe instead of removing bound checks we could provide iteration primitives?

I agree, xtensor has efficient iterator pairs that allow people to iterate more efficiently on arrays, but we should expect people to write quick-&-easy things with loops over indices.

aldanor · 2017-01-13T11:04:21Z

Just a random thought, another option: alongside the existing methods, add operator[](...) that would be unchecked:

arr[{i, j, k}]

(that is, if it's possible to do this so it's inlined to the same thing as at() without the bound check)

SylvainCorlay · 2017-01-13T11:06:06Z

In xtensor, we have different semantics for () and [].

() takes a compile-time determined number of arguments
[] takes a single xindex argument which may have a length that is computed at runtime.

wjakob · 2017-01-13T11:09:45Z

Apologies in advance for the following rant: I think that repeatedly bringing xtensor into the discussions here at pybind11 is not conductive. If there is some clever technical insight that can be ported to this project, then that's all fine with me. However, there is really no reason why the API/style of this project should somehow be bound to that of xtensor (rather than NumPy, which we're trying to wrap)

SylvainCorlay · 2017-01-13T12:07:29Z

Apologies in advance for the following rant: I think that repeatedly bringing xtensor into the discussions here at pybind11 is not conductive. If there is some clever technical insight that can be ported to this project, then that's all fine with me. However, there is really no reason why the API/style of this project should somehow be bound to that of xtensor (rather than NumPy, which we're trying to wrap)

I apologize if you found that mentioning xtensor in the discussions on numpy bindings was not appropriate. I thought that bringing this perspective in the discussion would help since we are working on numpy-style semantics in c++.

I deleted the comment where I detail the iterators that we use which was not so much related to the discussion.

JohanMabille · 2017-01-19T10:57:03Z

Actually, I wonder if someone would mix checked and unchecked access in their code. If that's not the case, maybe a policy-based design as mentioned by @SylvainCorlay could help. The default behavior would be the actual one, so it would remain consistent with the one of Python, and would avoid a frustrating learning curve as @dean0x7d pointed it out. And people that need pure performance could have the behavior they want with a simple template parameter change.

Besides, that would keep the interface simple and intuitive, and would avoid a counter intuitive change in py::array behavior.

Would that be an acceptable change for you guys ? If so, I can start working on it.

dean0x7d · 2017-01-19T22:28:00Z

I think policy-based would be a nice way to go. Using operator() would also make Python -> C++ pretty uniformly [] -> (). E.g. accessing .shape is also [] -> ():

Python

n = a.shape[1]
x = a[3, 3]

C++

auto n = a.shape(1)
auto x = a(3, 3)

operator[] with some special indexing objects sounds good too, but operator[](int) should explicitly be a compile-time error (with a nice message if practical) to avoid the headscratcher which is a[1, 2, 3] in C and C++. I assume this would be a common (silent) error when adapting code from Python to C++.

Regarding operator(), the unsafe version can simply return a reference: T& array_t<T>:: operator()(...). The safe version would need to use a proxy type to distinguish reading and writing for the mutability check: Proxy<T> array_t<T>::operator()(...). But the resulting syntax would be the same.

SylvainCorlay · 2017-01-19T23:27:45Z

The safe version would need to use a proxy type to distinguish reading and writing for the mutability check: Proxy array_t::operator()(...). But the resulting syntax would be the same.

I like the idea of the safe version check for the mutability for the writable flag.

wjakob · 2017-02-26T22:19:15Z

I'm wondering what the status is with regards to this? It looks like there was a useful discussion about a potential API, but that never quite materialized in a concrete PR. Is there still a plan to implement a form of unchecked access?

dean0x7d · 2017-02-27T23:48:10Z

I think the main discussion has moved to #617 where there are 2 active proposals. It would be best to continue over there and close this one.

SylvainCorlay force-pushed the all-unsafe branch from 6ec5e27 to e88d9e3 Compare January 9, 2017 09:37

Make access unsafe

94bfc70

SylvainCorlay force-pushed the all-unsafe branch from e88d9e3 to 94bfc70 Compare January 9, 2017 09:45

JohanMabille mentioned this pull request Jan 25, 2017

Unsafe access #617

Closed

dean0x7d closed this Feb 27, 2017

jagerman mentioned this pull request Mar 19, 2017

Add unchecked array access via proxy object #746

Merged

jagerman mentioned this pull request Jul 26, 2017

Could unchecked views prevent garbage collection of the underlying arrays? #961

Closed

rwgk mentioned this pull request Feb 9, 2023

FWD pybind11 google/pybind11clif#599

Closed

Make access unsafe #599

Make access unsafe #599

Uh oh!

Conversation

SylvainCorlay commented Jan 9, 2017

Uh oh!

dean0x7d commented Jan 9, 2017

Uh oh!

SylvainCorlay commented Jan 9, 2017

Uh oh!

wjakob commented Jan 9, 2017

Uh oh!

wjakob commented Jan 9, 2017

Uh oh!

SylvainCorlay commented Jan 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjakob commented Jan 9, 2017

Uh oh!

SylvainCorlay commented Jan 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dean0x7d commented Jan 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SylvainCorlay commented Jan 9, 2017

Uh oh!

dean0x7d commented Jan 11, 2017

Uh oh!

wjakob commented Jan 13, 2017

Uh oh!

SylvainCorlay commented Jan 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldanor commented Jan 13, 2017

Uh oh!

SylvainCorlay commented Jan 13, 2017

Uh oh!

aldanor commented Jan 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SylvainCorlay commented Jan 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldanor commented Jan 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SylvainCorlay commented Jan 13, 2017

Uh oh!

wjakob commented Jan 13, 2017

Uh oh!

SylvainCorlay commented Jan 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohanMabille commented Jan 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dean0x7d commented Jan 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SylvainCorlay commented Jan 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjakob commented Feb 26, 2017

Uh oh!

dean0x7d commented Feb 27, 2017

Uh oh!

Uh oh!

SylvainCorlay commented Jan 9, 2017 •

edited

Loading

SylvainCorlay commented Jan 9, 2017 •

edited

Loading

dean0x7d commented Jan 9, 2017 •

edited

Loading

SylvainCorlay commented Jan 13, 2017 •

edited

Loading

aldanor commented Jan 13, 2017 •

edited

Loading

SylvainCorlay commented Jan 13, 2017 •

edited

Loading

aldanor commented Jan 13, 2017 •

edited

Loading

SylvainCorlay commented Jan 13, 2017 •

edited

Loading

JohanMabille commented Jan 19, 2017 •

edited

Loading

dean0x7d commented Jan 19, 2017 •

edited

Loading

SylvainCorlay commented Jan 19, 2017 •

edited

Loading