Skip to content

Unsafe access #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

JohanMabille
Copy link
Contributor

@JohanMabille JohanMabille commented Jan 25, 2017

This PR adds unsafe access as suggested in #599. Changes have been split in two commits since you may want to keep py::array as a non template class; in that case it would be easy to revert the last commit and make py::array an alias of a default template type, something like:

template <class access_policy = safe_access_policy>
class generic_array
{
// ...
}

using array = generic_array<safe_access_policy>;

Let me know what you think

@dean0x7d
Copy link
Member

Just a quick comment: the alias is definitely needed for backward compatibility.

Copy link
Member

@jagerman jagerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I'm really convinced of this accessor policy approach. It feels a little un-C++ to me--personally I'd prefer a more stl-like approach where an object has both safe and unsafe accessors, but if the accessor policy is the way everyone else wants to go, so be it.

@@ -316,7 +316,64 @@ class dtype : public object {
}
};

class array : public buffer {
namespace detail {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be NAMESPACE_BEGIN(detail) to be consistent with the rest of pybind.

@@ -561,35 +595,36 @@ class array : public buffer {
}
};

template <typename T, int ExtraFlags = array::forcecast> class array_t : public array {
template <typename T, int ExtraFlags = array<>::forcecast> class array_t : public array<> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is array_t left out? I'd think taking it as an extra argument then inheriting from array<access_policy> would make a lot of sense here.

This isn't related to your PR, but I also don't understand why we take ExtraFlags as a template argument here instead of just as an argument to ensure().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though that since array_t is the C++ class for numpy array, we would want to always check the access to its elements. But I can propagate the policy parameter to it if this looks better.

@@ -316,7 +316,64 @@ class dtype : public object {
}
};

class array : public buffer {
namespace detail {
inline void fail_dim_check(size_t dim, size_t ndim, const std::string& msg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better as [[noreturn]] PYBIND11_NOINLINE inline void fail_dim_check(...

(Also, unindent this part; I seem to recall @wjakob mentioning that not indenting functions in namespaces was one of his primary motivations for using the namespace macros).

throw index_error(msg + ": " + std::to_string(dim) +
" (ndim = " + std::to_string(ndim) + ")");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAMESPACE_END(detail)

};

template <class access_policy = safe_access_policy>
class array : public buffer, private access_policy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing array to a template class is definitely going to break code; this needs to be somewhere new with a using array = array_base<safe_access_policy>. With that, you might as well drop the default on the template argument as well. You might also add a using array_unchecked = array_base<unsafe_access_policy>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it has already broken the tests :) I will revert this change .

@wjakob
Copy link
Member

wjakob commented Jan 25, 2017

cc @aldanor

@JohanMabille
Copy link
Contributor Author

JohanMabille commented Jan 25, 2017

PR updated, I reverted the changes in tests, updated numpy.h based on @jagerman remarks and squashed all in a single commit to keep history readable. Let me know what you think.

@JohanMabille JohanMabille force-pushed the unsafe_access branch 6 times, most recently from a483056 to 46ff174 Compare January 25, 2017 15:12
@jagerman
Copy link
Member

I don't understand why your travis-ci build didn't mark the GCC7 build as an allowed failure. The PR I just pushed did so, so perhaps it was a temporary travis-ci bug?

@JohanMabille
Copy link
Contributor Author

I think so, it marked it as an allowed failure every prior squashing commit.

@aldanor
Copy link
Member

aldanor commented Jan 26, 2017

I'm not sure that I'm really convinced of this accessor policy approach. It feels a little un-C++ to me--personally I'd prefer a more stl-like approach where an object has both safe and unsafe accessors

I'm inclined to agree with that for the most part.

  • There's one extra layer of template parameter madness for array types
  • There's one more base class, so now there's array_base<> everywhere in library code
  • Now there's array<safe> and array<unsafe> which are different types from C++'s point of view, although really they are the same thing. Also this template parameter is meaningless from return type perspective

Although, I concur with @jagerman, if everyone and their grandmother agrees on that this is the way to go, then whatever. I'm also wondering what's the typical downstream example of code that would benefit from this and that wouldn't be possible on the current master?

@jagerman
Copy link
Member

jagerman commented Jan 26, 2017

As to:

  • Now there's array<safe> and array<unsafe> which are different types from C++'s point of view, although really they are the same thing. Also this template parameter is meaningless from return type perspective

I think they should be convertible, i.e. I'm pretty sure you'll be able to do: array<unsafe> aunsafe = asafe; and have two (C++) variables referencing the same ndarray, but one checks on access while the other doesn't.

@aldanor
Copy link
Member

aldanor commented Jan 26, 2017

Yea, you're right. Another option is to have .unchecked() method that would return an unchecked reference. Although this way you could even do it without the policy type parameter at all I think, just by having a friend proxy array_unchecked that only provides unchecked methods.

@dean0x7d
Copy link
Member

@aldanor said:
Although, I concur with @jagerman, if everyone and their grandmother agrees on that this is the way to go, then whatever.

I don't think any consensus was reached, so I wouldn't call in the grandmothers quite yet. My read on the situation is the following:

  • @SylvainCorlay and @JohanMabille want unsafe access but don't really mind what the implementation is.
  • @jagerman and @aldanor prefer operator() (unsafe) + at()/mutable_at() (safe) for an STL-like approach. This effectively makes it unsafe by default. (Note that py::array's operator() would also be unsafe on const-ness, in contrast to the STL only being unsafe on bounds.)
  • @wjakob suggested safe in debug mode and unsafe in release similar to Eigen.
  • I was advocating for keeping safe by default.

At the end of the day, I don't mind it going another way, I'm just voicing my concerns.

@dean0x7d
Copy link
Member

Another option is to have .unchecked() method that would return an unchecked reference. Although this way you could even do it without the policy type parameter at all I think, just by having a friend proxy array_unchecked that only provides unchecked methods.

That sounds like an interesting approach to me. The returned object could be a pure C++ type (no py::object subclassing) and just hold the relevant data/shape/strides pointers. Full unsafe access (read or write items), but no mutating the array (resize).

@SylvainCorlay
Copy link
Member

SylvainCorlay commented Jan 26, 2017

I don't think any consensus was reached, so I wouldn't call in the grandmothers quite yet. My read on the situation is the following:

  • @SylvainCorlay and @JohanMabille want unsafe access but don't really mind what the implementation is.
  • @jagerman and @aldanor prefer operator() (unsafe) + at()/mutable_at() (safe) for an STL-like approach. This effectively makes it unsafe by default. (Note that py::array's operator() would also be unsafe on const-ness, in contrast to the STL only being unsafe on bounds.)
  • @wjakob suggested safe in debug mode and unsafe in release similar to Eigen.
    I was advocating for keeping safe by default.
    At the end of the day, I don't mind it going another way, I'm just voicing my concerns.

I lean towards the second approach by @jagerman and @aldanor. The safe on const-ness is tricky to achieve since numpy constness is a runtime thing.

@jagerman
Copy link
Member

jagerman commented Jan 27, 2017

The returned object could be a pure C++ type (no py::object subclassing) and just hold the relevant data/shape/strides pointers. Full unsafe access (read or write items), but no mutating the array (resize).

This is basically what Eigen support does in PR #610. It should have a PyObject somewhere, though, with a proper reference recorded so that numpy knows the base is referenced (and thus will also prohibit resize on the referencee).

Edit: and also to guarantee that the original stays around, of course. (But "original" here could be up the chain if the py::array it itself an array reference).

@dean0x7d
Copy link
Member

I was thinking super stripped down:

template<class T>
class unsafe_reference {
    T* data;
    const size_t* shape;
    const size_t* strides;

    friend class array;
    unchecked_accessor(T* data, const size_t* shape, const size_t* strides);

public:
    template<class Idx> T& operator(Idx... idx);
    size_t shape(size_t dim) const;
};

Usage:

void f(py::array a) {
    auto r = a.unsafe_ref<double>();
    // any usage of `r` is unchecked
    double item = r(2, 3);
}

If it's just going to be a reference, it doesn't need to keep anything alive (it wouldn't be a function parameter like Eigen types).

@jagerman
Copy link
Member

jagerman commented Jan 27, 2017

It might be simpler still to just stash a array &ref in the class, add the (unchecked) implementations as private methods of array with unsafe_reference<T> declared a friend that calls them. That way you keep everything in array and unsafe_reference just exposes unsafe access (presumably behind a .writeable check on construction).

@dean0x7d
Copy link
Member

Stashing the data/strides/shape pointers avoids having to reload them for every access.

@jagerman
Copy link
Member

Oh, right, I was thinking array already had them.

Also: storing a void* would be better here with a cast to T* after computing the position (since numpy strides are in bytes, not element sizes).

@JohanMabille
Copy link
Contributor Author

To be honest, I agree with @jagerman : this approach is quite unusual in C++, generally both safe and unsafe accessors are provided. However, since there were many concerns about the unsafe-by-default approach but none about the policy approach, I thought this last one could conciliate everyone's concerns.

@jagerman
Copy link
Member

So far I like @dean0x7d 's solution the best; it gives you unchecked access, but hides it behind a gatekeeper method which can do the .writeable test just once. (It's not 100%: you could get the reference then go back and change .writeable, but that doesn't seem worth worrying about).

@dean0x7d
Copy link
Member

dean0x7d commented Feb 27, 2017

As far as I can see, we're down to two proposals:

  1. Have operator() with unsafe access by default + at()/mutable_at() for safe access.
  2. @aldanor's proposal from Unsafe access #617 (comment) where py::array is safe by default, but can return a super simple array (not py::object derived) with unsafe access (e.g. quick sketch).

I'm personally in favor of option 2. What about others? 1 or 2?

@dean0x7d dean0x7d mentioned this pull request Feb 27, 2017
@jagerman
Copy link
Member

jagerman commented Mar 1, 2017

I'm more or less ambivalent between the two. I like 1's interface more, because it means I don't need to do anything special when I already know the array sizes are valid (e.g. because I just created the array). On the other hand, it bypasses the writeable() check, which seems wrong to me.

So I think I have a very weak preference for (2) over (1).

@wjakob
Copy link
Member

wjakob commented Mar 1, 2017

Both are fine with me.

@jagerman
Copy link
Member

jagerman commented Mar 1, 2017

@JohanMabille - I guess that's basically pre-approval for a suitable implementation of (2) to resolve this, if you want to update the PR.

@nbecker
Copy link

nbecker commented Mar 1, 2017

I have a pretty strong preference for making operator() unsafe, and use at if you want it. This is consistent with STL, IIRC
In general, I don't want checks on, I am using c++ because I need speed. If I need to find an access violation, I'll use valgrind.

@jagerman
Copy link
Member

jagerman commented Mar 1, 2017

@nbecker - the one thing I see in support of the intermediate class is that it enforces numpy's writeable flag. STL containers, on the other hand, get that enforcement at compile-time, which is why they don't need something like this. For numpy, however, a run-time check is required (but doing it on each access would obviate much of the point of providing unsafe access in the first place).

@jagerman
Copy link
Member

jagerman commented Mar 1, 2017

(And if someone figures out a nicer way of dealing with that, perhaps through the constness of the py::array type, I'd also prefer (1) to (2)).

@nbecker
Copy link

nbecker commented Mar 2, 2017

Another possibility is to make checking a compile time option (e.g., NDEBUG)

@aldanor
Copy link
Member

aldanor commented Mar 2, 2017

I would personally strongly oppose reusing NDEBUG for this, it has already been discussed somewhere above, or in a related thread. A separate, dedicated def - maybe, if that's really needed... However compile-time def is less flexible than any of the two options listed above, so it wouldn't replace them.

@jagerman
Copy link
Member

To all involved: input on #746 would be appreciated.

@wjakob
Copy link
Member

wjakob commented Mar 20, 2017

Closing this in favor of #746

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants