Skip to content

NumPy registered dtypes may not be shared across modules #468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aldanor opened this issue Oct 28, 2016 · 5 comments
Closed

NumPy registered dtypes may not be shared across modules #468

aldanor opened this issue Oct 28, 2016 · 5 comments

Comments

@aldanor
Copy link
Member

aldanor commented Oct 28, 2016

We've recently run into a series of strange problems when sharing registered NumPy dtypes across multiple pybind modules -- the dtypes would appear to be missing when using LTO (-flto on gcc 5), likely due to the template static data being BSS (and it would somehow work when the flag was disabled).

A simple failing example is as follows:

// colour.h
struct Grey { int lum; };
// colour_reg.cpp
#include <pybind11/numpy.h>
#include "colour.h"

namespace py = pybind11;
PYBIND11_PLUGIN(colour_reg) {
    PYBIND11_NUMPY_DTYPE(Grey, lum);
    py::module m("colour_reg");
    m.attr("dtype") = pybind11::dtype::of<Grey>();
    return m.ptr();
}
// colour_use.cpp
#include <pybind11/numpy.h>
#include "colour.h"

namespace py = pybind11;
PYBIND11_PLUGIN(colour_use) {
    py::module m("colour_use");
    m.attr("dtype") = pybind11::dtype::of<Grey>();
    return m.ptr();
}
# test_dtype.py
import colour_reg
import colour_use  # ~ERROR: ImportError: NumPy: unsupported buffer format!
print('ok')

The point is, we shouldn't rely on the linker to stitch things together, and the proper approach would be to do the same thing as we do with everything else -- i.e., share data through capsules. This would imply that accessing a dtype of a C++ type would require indexing an associated hashmap, but that's probably acceptable.

I see three possible ways of doing this so that the rest of pybind11 code remains (almost) untouched:

  1. numpy.h could register its own capsule ("numpy_internals"), and share it the same way it's done in get_internals(). The loading would be done in api_lookup, where NumPy API capsule is loaded. This new capsule would hold an unordered_map<std::type_index, numpy_type_info>, where numpy_type_info would hold a pointer to the dtype and the format string (at the very least).

  2. Allow extending internals without changing or breaking the base API -- this functionality could be very handy in future when things stabilize. Basically, add something like std::unordered_map<std::string, void *> capsules to internals (could use ints instead of strings for keys), then e.g. numpy.h would get its own "capsule" via get_internals().capsules["numpy"]. This way, we don't have to copy/paste PyCapsule calls, plus we use the same capsule versioning conversion across all code (e.g. pybind11_1.8).

  3. The same as (2), but in type_info. However, this limits the "custom" data to being per-type, so (2) is probably a better option.

+@wjakob +@bennorth

@wjakob
Copy link
Member

wjakob commented Oct 28, 2016

Just to clarify: the problem here is a static data member in a template? It makes sense that this wouldn't be shared across modules (and I'm actually surprised that it works when LTO is not used).

I like approach 2, but I don't get the signature of your unordered_map. Why not make it std::unordered_map<std::type_index, numpy_type_info>?

PS: In that case, it would be important to make numpy_type_info very slim so that it doesn't need to pull in numpy.h.

@aldanor
Copy link
Member Author

aldanor commented Oct 28, 2016

@wjakob Yes, that seems to be the problem (and I was surprised as well that it does sometimes work).

To expand on (2), my suggestion is more or less this:

// common.h
struct internals {
    ...
    // could be extended with more optional data to be shared across Python modules
    // without breaking base pybind11 API
    std::unordered_map<std::string, void *> capsules;  // could be named `shared_data`
}
// numpy.h
struct numpy_type_info { ... }

struct numpy_internals {
    std::unordered_map<std::type_index, numpy_type_info> registered_dtypes;
};

...

numpy_internals& get_numpy_internals() {
    // (numpy_internals *) get_internals().capsules["numpy"]
}

The extra map in internals is to decouple it from other code that wants to share arbitrary data across Python modules using the existing versioned pybind11 capsules (kind of the same thing as pybind11 storing its own capsule in builtins). We are free to modify things in numpy.h or connect more "capsules" without breaking the clients or touching common.h.

@wjakob
Copy link
Member

wjakob commented Oct 31, 2016

Hi @aldanor,

in that case, let's make it a public API:

namespace pybind11 {
   /// Returns a named pointer that is shared among all extension modules (using the same pybind11 version) running in the current interpreter. Names with underscores are reserved for internal usage. Returns ``nullptr`` if no matching entry was found.
   void *get_shared_data(const char *name);
   /// ...
   void set_shared_data(const char *name, void *data);
};

You'll probably want to cache the lookup in get_numpy_internals, or this will be very slow!

@aldanor
Copy link
Member Author

aldanor commented Oct 31, 2016

@wjakob Yep, that makes sense (I've had something similar but different naming and under the detail namespace), added your suggestion to the #472 PR

@aldanor
Copy link
Member Author

aldanor commented Nov 3, 2016

Closed via #472

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants