Skip to content

Conversation

@reyoung
Copy link
Collaborator

@reyoung reyoung commented Jul 17, 2017

Basically following
http://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html

  • Use buffer protocol to return a view of Tensor. It can be cast to
    numpy array in Python.
  • Set a numpy array to a tensor.

Basically following
http://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html

* Use buffer protocol to return a view of Tensor. It can be cast to
  numpy array in Python.
* Set a numpy array to a tensor.
};

template <typename T>
std::ostream& operator<<(std::ostream& os, const std::vector<T>& vec) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this operator definition is not necessary. We can print std::vector by calling std::copy. For example:

 std::copy(path.begin(), path.end(), std::ostream_iterator<T>(std::cout, " "));


USE_OP(add_two);

struct PlaceDebugString : public boost::static_visitor<std::string> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It seems that PlaceDebugString should be in platform/place.h, but not here.

  • The usage of PlaceDebugString in this PR is not for debugging, so the name shouldn't contain Debug. PrintPlace seems more accurate.

  • PrintPlace can call std::ostream &operator<<(std::ostream &, const Place &); in platform.place.h, or vice versa, to ensure the consistent representation of Places.


DDim dims() const { return dims_; }

platform::Place place() const { return holder_->place(); }
Copy link
Collaborator

@wangkuiyi wangkuiyi Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method Tensor::place would cause segment fault if a Tensor is constructed by its mutable_data hasn't been called. Given such delicacy, I'd suggest not to expose it to all users.

It seems that Tensor::place is exposed only for CastToPyBuffer. It seems that what we should add is Tensor::CastToPyBuffer, or Tensor::PyBuffer.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure but please check if we need al lthe following facilities for the conversion from Tensor to pybind::buffer_info:

  1. struct TensorToPyBuffer
  2. CastToPyBufferImpl
  3. CastToPyBuffer

Even if we need some of them, I would suggest move the definitions out from pybind/pybind.cc to either framework/tensor_pybind.h or pybind/tensor.h.

py::module m("core", "C++ core of Paddle Paddle");

py::class_<paddle::platform::Place>(
m, "Place", R"DOC(Device Place Class.)DOC")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that R" is not necessary for a single-line string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't see this operator is used in this PR. Why would we need it?

py::class_<pd::Tensor>(m, "Tensor", py::buffer_protocol())
.def("get_place", &pd::Tensor::place)
.def_buffer([](pd::Tensor& self) -> py::buffer_info {
PADDLE_ENFORCE(paddle::platform::is_cpu_place(self.place()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that PADDLE_ENFORCE(paddle::platform::is_cpu_place(self.place()) can be moved into CastToPyBuffer, so we can make CastToPyBuffer a friend or a method of class Tensor. This allows us not to expose too many low-level attributes like Tensor::place() and Tensor::size().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good idea of friend class. But it cannot be done because CastToPyBuffer has many templates arguments.


template <size_t I, typename... ARGS>
struct CastToPyBufferImpl<false, I, ARGS...> {
py::buffer_info operator()(pd::Tensor& tensor) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems sufficient to define a function other than a functor here.

using CUR_TYPE = typename std::tuple_element<I, std::tuple<ARGS...>>::type;
py::buffer_info operator()(pd::Tensor& tensor) {
TensorToPyBuffer<CUR_TYPE> cast_object(tensor);
if (cast_object.CanCast()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here seems the only application of TensorToPyBuffer that it doesn't bother to define the class TensorToPyBuffer.

size_t prod = 1;
for (size_t i = dim_vec.size(); i != 0; --i) {
dims_outside[i - 1] = (size_t)dim_vec[i - 1];
strides[i - 1] = sizeof(float) * prod;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float => T ?

}
};

template <bool less, size_t I, typename... ARGS>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that less is not needed as we can compare I with std::tuple_size<std::tuple<ARGS...>::value.

}
};

template <typename T>
Copy link
Collaborator

@wangkuiyi wangkuiyi Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to simplify the following code

template <typename T>
struct TensorToPyBuffer {
  pd::Tensor& self_;
  explicit TensorToPyBuffer(pd::Tensor& self) : self_(self) {}

  bool CanCast() const { return std::type_index(typeid(T)) == self_.type(); }

  py::buffer_info Cast() const {
    auto dim_vec = pd::vectorize(self_.dims());
    std::vector<size_t> dims_outside;
    std::vector<size_t> strides;
    dims_outside.resize(dim_vec.size());
    strides.resize(dim_vec.size());

    size_t prod = 1;
    for (size_t i = dim_vec.size(); i != 0; --i) {
      dims_outside[i - 1] = (size_t)dim_vec[i - 1];
      strides[i - 1] = sizeof(float) * prod;
      prod *= dims_outside[i - 1];
    }

    return py::buffer_info(self_.mutable_data<T>(self_.place()),
                           sizeof(T),
                           py::format_descriptor<T>::format(),
                           (size_t)pd::arity(self_.dims()),
                           dims_outside,
                           strides);
  }
};

template <bool less, size_t I, typename... ARGS>
struct CastToPyBufferImpl;

template <size_t I, typename... ARGS>
struct CastToPyBufferImpl<false, I, ARGS...> {
  py::buffer_info operator()(pd::Tensor& tensor) {
    PADDLE_THROW("This type of tensor cannot be expose to Python");
    return py::buffer_info();
  }
};

template <size_t I, typename... ARGS>
struct CastToPyBufferImpl<true, I, ARGS...> {
  using CUR_TYPE = typename std::tuple_element<I, std::tuple<ARGS...>>::type;
  py::buffer_info operator()(pd::Tensor& tensor) {
    TensorToPyBuffer<CUR_TYPE> cast_object(tensor);
    if (cast_object.CanCast()) {
      return cast_object.Cast();
    } else {
      constexpr bool less = I + 1 < std::tuple_size<std::tuple<ARGS...>>::value;
      return CastToPyBufferImpl<less, I + 1, ARGS...>()(tensor);
    }
  }
};

template <typename T>
std::ostream& operator<<(std::ostream& os, const std::vector<T>& vec) {
  for (size_t i = 0; i < vec.size(); ++i) {
    os << vec[i];
    if (i + 1 != vec.size()) {
      os << ", ";
    }
  }
  return os;
}

py::buffer_info CastToPyBuffer(pd::Tensor& tensor) {
  auto buffer_info = CastToPyBufferImpl<true, 0, float, int>()(tensor);
  return buffer_info;
}

into

template <size_t TypeIndex, typename... ElementTypes>
py::buffer_info TryCastToPythonBuffer(pd::Tensor& tensor) {
  using Types = std::tuple<ElementTypes...>;
  PADDLE_ENFORCE(TypeIndex < std::tuple_size<Types>::value,
                 "This tensor is not any type that can be cast to a NumPy array.");
  
  using T = typename std::tuple_element<TypeIndex, Types>::type;
  if (std::type_index(typeid(T)) == tensor.type()) {
    std::vector<size_t> strides(tensor.dims().arity());
    size_t prod = 1;
    for (int a = strides.size(); a > 0; --a) {
      strides[a-1] = sizeof(T) * prod;
      prod *= get(tensor.dims(), a-1);
    }

    return py::buffer_info(self_.mutable_data<T>(self_.place()),
                           sizeof(T),
                           py::format_descriptor<T>::format(),
                           (size_t)pd::arity(self_.dims()),
                           vectorize(tensor.dims()),
                           strides);
  } else {
    return TryCastToPythonBuffer<I+1, ElementTypes>(tensor);
  }
};

py::buffer_info CastToPyBuffer(pd::Tensor& tensor) {
  PADDLE_ENFORCE(paddle::platform::is_cpu_place(self.place()),
                 "Only CPU tensor can be cast to numpy array");
  return TryCastToPythonBuffer<0, float, int>(tensor);
}

I haven't debug the new snippet, just hopefully it could shorten the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use bool less in template

The bool less is useful, because C++ 11 lack of static if statement, all codes in that block will be compiled, and part of them cannot be specialized.

std::tuple_element<TypeIndex, Types>::type;

this line will cause a compile error because TypeIndex might larger than tuple size.

The template argument less is used for specializing two conditions. When less=False, we do not need to get the current type from the tuple.

Another way is to use std::enable_if to do the similar thing. But that will be harder to read.

Why use functor, i.e, struct with operator(), instead of function.

Because partial specialization is not allowed for cpp function. e.g. That code segment will never be compiled by a CPP compiler.

template <bool flag, typename T>
int foo();

template <typename T>
T foo<true, T>() {
  return 0;
}

template <typename T>
T foo<false, T>() {
  return 0;
}

Copy link
Collaborator Author

@reyoung reyoung Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other I will follow comments.

  1. Compile CastToPyBufferImpl and TensorToPyBuffer together.
  2. Move them into a header file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, get_place should be a public method for tensor because of two reason.

  • The CastToPyBufferImpl class is in module paddle::pybind. If that class is a friend of Tensor, it could be
    1. Make cross dependency between two module.
    2. CastToPyBufferImpl is a details class, should be private in that module.
    3. CastToPyBufferImpl has many template arguments, and cannot be defined in one friend class.
  • Tensor has an attribute of place actually. Even in mathematical thinking, tensor should not contain device information, but actually, in neural network frameworks, it actually holds a device information.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyoung I don't understand what the issue that C++11 doesn't have static if. I built the following test program using GCC with -std=c++11

#include <iostream>
#include <tuple>

template <size_t TypeIndex, typename... ElementTypes>
void SelectType() {
  using Types = std::tuple<ElementTypes...>;

  if (TypeIndex >= std::tuple_size<Types>::value) {
    std::cout << "Larger\n";
  } else {
    std::cout << "OK\n";
  }
}
     
int main() {
  SelectType<0, float, int>();
  SelectType<1, float, int>();
  SelectType<2, float, int>();
  return 0;
}

and it works:

$ g++ a.cc -std=c++11 -o a && ./a
OK
OK
Larger

Copy link
Collaborator Author

@reyoung reyoung Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following program cannot be compiled, because the if/else statement is not run at compile time, i.e, not static if. So CUR_TYPE will always specialize, and give an error like

Error message is static_assert failed "tuple_element index out of range"

#include <iostream>
#include <tuple>

template <size_t TypeIndex, typename... ElementTypes>
void SelectType() {
  using Types = std::tuple<ElementTypes...>;

  if (TypeIndex >= std::tuple_size<Types>::value) {
    std::cout << "Larger\n";
  } else {
    using CUR_TYPE = typename std::tuple_element<TypeIndex, std::tuple<ElementTypes...>>::type;
    std::cout << "OK\n";
  }
}

int main() {
  SelectType<0, float, int>();
  SelectType<1, float, int>();
  SelectType<2, float, int>();
  return 0;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static if is added in C++ 17

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the reason! Thanks! @reyoung

* Follow review comments to seperate Tensor Numpy interactive methods in
  tensor.h.
* Simplify logic for `CastToPyBufferImpl`, make it as one struct and in
  details namespace.
* Remove `Scope` expose in Python, since it currently is useless.
* Remove some debug functions.
@wangkuiyi
Copy link
Collaborator

@reyoung The following program works. And it might lead to a simplified implementation that doesn't require std::tuple/std::tuple_element nor functor. But anyway we can merge this PR before we try a simpler version.

#include <iostream>
#include <typeinfo>
#include <typeindex>

void SelectType() {}

template <class T, class... Ts>
void SelectType(T arg, Ts... args) {
  if (std::type_index(typeid(arg)) == std::type_index(typeid(int))) {
    std::cout << "OK, found int\n";
  } else {
    SelectType(args...);
  }
}

int main() {
  SelectType(0.0f, 0L, 0);
  return 0;
}

@reyoung reyoung force-pushed the feature/is_in_gpu branch from 6755eec to fa42c90 Compare July 18, 2017 06:40
@reyoung reyoung force-pushed the feature/is_in_gpu branch from fa42c90 to 1dc53a2 Compare July 18, 2017 07:23
@reyoung reyoung merged commit c85a323 into PaddlePaddle:develop Jul 18, 2017
@reyoung reyoung deleted the feature/is_in_gpu branch July 18, 2017 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants