Skip to content

[BUG]: retranslating local exception with custom data class hangs in free-threading cpython #5346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
vfdev-5 opened this issue Aug 30, 2024 · 3 comments · Fixed by #5362
Closed
3 tasks done
Labels
triage New bug, unverified

Comments

@vfdev-5
Copy link
Contributor

vfdev-5 commented Aug 30, 2024

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

2.13.5 and from source

Problem description

I have a program stuck when retranslating local exception with custom data class using CPython with free-threading mode enabled.

This is due to internals.mutex being in the locked state when the callback is running. When using custom data, all_type_info_get_cache method is called internally which also tries to lock internals.mutex.

See the repro code below

Reproducible example code

C++ test1.cpp

// https://pybind11.readthedocs.io/en/stable/basics.html
// clang++ -O3 -Wall -shared -std=c++11 -fPIC $(python -m pybind11 --includes) test1.cpp -o test1.so
// python -c "import test1; test1.check()"

#include <string>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>


namespace py = pybind11;

struct CustomData {
  CustomData(const std::string & a): a(a) {}
  std::string a;
};


struct CustomError {
  CustomError(const CustomData & message) : message(message) {}
  CustomData message;
};


PYBIND11_MODULE(test1, m, py::mod_gil_not_used()) {
  m.doc() = "test custom exception with free-threading";

  m.def("check", [](){
    auto d1 = CustomData("abc");
    throw CustomError(d1);
  });

  py::class_<CustomData>(m, "CustomData", py::module_local())
        .def(py::init<const std::string &>())
        .def_readwrite("a", &CustomData::a);
  py::register_local_exception_translator([](std::exception_ptr p) {
    try {
      if (p)
        std::rethrow_exception(p);
    } catch (const CustomError &e) {

      printf("Handle CustomError exception\n");
      auto mod = py::module_::import("exceptions1");
      py::object obj = mod.attr("CustomError");
      printf("Before the exception creation\n");

      // Here we can check that internals.mutex is locked: internals.mutex.mutex._bits == 1
      // obj(e.message) calls `all_type_info_get_cache` which would try to lock again internals

      // // If we unlock internals.mutex then obj(e.message) will work otherwise
      // // execution hangs.
      // auto &internals = py::detail::get_internals();
      // internals.mutex.unlock();

      py::object obj2 = obj(e.message);

      // We should lock again if we unlocked it
      // internals.mutex.lock();

      printf("After the exception creation\n");
      PyErr_SetObject(PyExc_Exception, obj2.ptr());

    }
  });
}

Python code: exceptions1.py

class CustomError(Exception):
    def __init__(self, message):
        self.message = message
        super().__init__(message)

    def __str__(self):
        s = "[python]: " + self.message.a
        return s

Run:

python -c "import test1; test1.check()"

Output:

Handle CustomError exception
Before the exception creation

Is this a regression? Put the last known working version here if it is.

Not a regression

@colesbury
Copy link
Contributor

I think the easiest approach might be to use a different mutex for the exception translators than for the rest of the internals. Other than initialization and finalization, I think the accesses for exception translators and the rest of internals are separate

The alternative is to unlock before calling exception translators, but to do that safely I think you may need to copy the list of translators.

@vfdev-5
Copy link
Contributor Author

vfdev-5 commented Sep 9, 2024

Thanks @colesbury let me check these approaches!

I tried another alternative doing the following, checking whether the mutex is already locked and lock it only if unlocked:

// Wrapper around PyMutex to provide BasicLockable semantics
class pymutex {
    PyMutex mutex;

public:
    pymutex() : mutex({}) {}
    void lock() { PyMutex_Lock(&mutex); }
    void unlock() { PyMutex_Unlock(&mutex); }
    bool has_lock() { return mutex._bits == 1; }
};


#ifdef Py_GIL_DISABLED
#    define PYBIND11_LOCK_INTERNALS(internals) std::unique_lock<pymutex> lock((internals).mutex, std::defer_lock); if ((internals).mutex.has_lock()) { lock.lock(); }
#else
#    define PYBIND11_LOCK_INTERNALS(internals)
#endif

What do you think?

@colesbury
Copy link
Contributor

That looks like it would skip locking the mutex if it's held by another thread, which would not be safe. But also, your sample code only locks the mutex if it's already locked, so I don't think it will ever get locked.

vfdev-5 added a commit to vfdev-5/pybind11 that referenced this issue Sep 9, 2024
vfdev-5 added a commit to vfdev-5/pybind11 that referenced this issue Sep 16, 2024
rwgk pushed a commit that referenced this issue Sep 17, 2024
…eptions (#5362)

* Added exception translator specific mutex used with try_translate_exceptions
Fixes #5346

* - Replaced with_internals_for_exception_translator by with_exception_translators
- Incremented PYBIND11_INTERNALS_VERSION
- Added a test

* style: pre-commit fixes

* Fixed formatting and added explicit to ctors

* Addressed PR review comments

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage New bug, unverified
Projects
None yet
2 participants