Skip to content

Conversation

b-pass
Copy link
Contributor

@b-pass b-pass commented Jul 28, 2025

Description

This code in implicitly_convertible() was using a static bool which is not sub-interpreter safe.

The free-threading code attempted to make this thread safe by making it a thread_local, but thread_local does not work on some older macOS targets, so the correct way to fix this is to use thread-specific-storage.

Suggested changelog entry:

  • Fixed non-entrant check in implicitly_convertible()

b-pass added 2 commits July 28, 2025 16:43
The previous code had multiple (one for every type pair, as this is a template function), which may have posed a problem for some platforms.
@b-pass
Copy link
Contributor Author

b-pass commented Jul 28, 2025

This change causes GraalPy to fail at shutdown (apparently trying to call a Python function after python has ended). I don't really have idea idea why this change would cause a problem like this. All the tests pass (before the shutdown failure). (Also note, several of the other failures this past run are from github giving 502 errors, but the GraalPy failures are real.)

java.lang.NullPointerException
	at com.oracle.graal.python.runtime.GilNode$Cached.acquire(GilNode.java:92)
	at com.oracle.graal.python.runtime.GilNode$Cached.acquire(GilNode.java:71)
	at com.oracle.graal.python.runtime.GilNode.acquire(GilNode.java:228)
	at com.oracle.graal.python.builtins.modules.cext.PythonCextBuiltins$CachedExecuteCApiBuiltinNode.execute(PythonCextBuiltins.java:809)
	at com.oracle.graal.python.builtins.modules.cext.PythonCextBuiltins$CApiBuiltinExecutable$Execute.doExecute(PythonCextBuiltins.java:670)
	at com.oracle.graal.python.builtins.modules.cext.CApiBuiltinExecutableGen$InteropLibraryExports$Cached.executeAndSpecialize(CApiBuiltinExecutableGen.java:143)
	at com.oracle.graal.python.builtins.modules.cext.CApiBuiltinExecutableGen$InteropLibraryExports$Cached.execute(CApiBuiltinExecutableGen.java:113)
	at com.oracle.truffle.truffle_nfi/com.oracle.truffle.nfi.CallSignatureNode$OptimizedCallClosureNode.doCall(CallSignatureNode.java:262)
	at com.oracle.truffle.truffle_nfi/com.oracle.truffle.nfi.CallSignatureNodeFactory$OptimizedCallClosureNodeGen.executeAndSpecialize(CallSignatureNodeFactory.java:646)
	at com.oracle.truffle.truffle_nfi/com.oracle.truffle.nfi.CallSignatureNodeFactory$OptimizedCallClosureNodeGen.execute(CallSignatureNodeFactory.java:591)
	at com.oracle.truffle.truffle_nfi/com.oracle.truffle.nfi.NFIClosure$Execute.doOptimizedDirect(NFIClosure.java:95)
	at com.oracle.truffle.truffle_nfi/com.oracle.truffle.nfi.NFIClosureGen$InteropLibraryExports$Cached.executeAndSpecialize(NFIClosureGen.java:231)
	at com.oracle.truffle.truffle_nfi/com.oracle.truffle.nfi.NFIClosureGen$InteropLibraryExports$Cached.execute(NFIClosureGen.java:194)
	at com.oracle.truffle.truffle_nfi_libffi/com.oracle.truffle.nfi.backend.libffi.LibFFIClosure$CallClosureNode.doCall(LibFFIClosure.java:207)
	at com.oracle.truffle.truffle_nfi_libffi/com.oracle.truffle.nfi.backend.libffi.LibFFIClosureFactory$CallClosureNodeGen.executeAndSpecialize(LibFFIClosureFactory.java:133)
	at com.oracle.truffle.truffle_nfi_libffi/com.oracle.truffle.nfi.backend.libffi.LibFFIClosureFactory$CallClosureNodeGen.execute(LibFFIClosureFactory.java:86)
	at com.oracle.truffle.truffle_nfi_libffi/com.oracle.truffle.nfi.backend.libffi.LibFFIClosure$VoidRetClosureRootNode.execute(LibFFIClosure.java:360)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:819)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot(OptimizedCallTarget.java:743)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:667)
	at org.graalvm.truffle.runtime.svm/com.oracle.svm.truffle.api.SubstrateOptimizedCallTarget.invokeCallBoundary(SubstrateOptimizedCallTarget.java:124)
	at com.oracle.truffle.enterprise.svm/com.oracle.svm.enterprise.truffle.SubstrateEnterpriseOptimizedCallTarget.a(stripped:289)
	at com.oracle.truffle.enterprise.svm/com.oracle.svm.enterprise.truffle.SubstrateEnterpriseOptimizedCallTarget.doInvoke(stripped:255)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callDirect(OptimizedCallTarget.java:599)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:545)
	at org.graalvm.truffle.runtime.svm/com.oracle.svm.truffle.nfi.NativeClosure.call(NativeClosure.java:198)
	at org.graalvm.truffle.runtime.svm/com.oracle.svm.truffle.nfi.NativeClosure.doInvokeClosureVoidRet(NativeClosure.java:357)
	at org.graalvm.truffle.runtime.svm/com.oracle.svm.truffle.nfi.NativeClosure.invokeClosureVoidRet0(NativeClosure.java:349)
	at org.graalvm.truffle.runtime.svm/com.oracle.svm.truffle.nfi.NativeClosure.invokeClosureVoidRet(NativeClosure.java:333)
	at [email protected]/java.lang.Shutdown.halt0(Native Method)
	at [email protected]/java.lang.Shutdown.halt(Shutdown.java:149)
	at [email protected]/java.lang.Shutdown.exit(Shutdown.java:168)
	at [email protected]/java.lang.Runtime.exit(Runtime.java:177)
	at [email protected]/java.lang.System.exit(System.java:1518)
	at com.oracle.graal.python.shell.GraalPythonMain.launch(GraalPythonMain.java:860)
	at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:312)
	at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:126)
	at org.graalvm.launcher.AbstractLanguageLauncher.runLauncher(AbstractLanguageLauncher.java:180)
Unhandled exception: java.lang.NullPointerException: null
    at com.oracle.truffle.polyglot.PolyglotLocals$LocalLocation.readLocal(PolyglotLocals.java:523)
    at com.oracle.truffle.polyglot.PolyglotLocals$LanguageContextThreadLocal.get(PolyglotLocals.java:369)
    at com.oracle.truffle.nfi.backend.libffi.LibFFILanguage.getNFIState(LibFFILanguage.java:117)
    at com.oracle.svm.truffle.nfi.NativeClosure.doInvokeClosureVoidRet(NativeClosure.java:364)
    at com.oracle.svm.truffle.nfi.NativeClosure.invokeClosureVoidRet0(NativeClosure.java:349)
    at com.oracle.svm.truffle.nfi.NativeClosure.invokeClosureVoidRet(NativeClosure.java:333)
    at java.lang.Shutdown.halt0(Shutdown.java:-2)
    at java.lang.Shutdown.halt(Shutdown.java:149)
    at java.lang.Shutdown.exit(Shutdown.java:168)
    at java.lang.Runtime.exit(Runtime.java:177)
    at java.lang.System.exit(System.java:1518)
    at com.oracle.graal.python.shell.GraalPythonMain.launch(GraalPythonMain.java:860)
    at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:312)
    at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:126)
    at org.graalvm.launcher.AbstractLanguageLauncher.runLauncher(AbstractLanguageLauncher.java:180)

Fatal error: Unhandled exception

rwgk added 2 commits July 28, 2025 20:25
set_flag is an RAII guard for a thread-specific reentrancy flag.
Copying or moving it would risk double-resetting or rearming the flag,
breaking the protection. Disable copy/move constructors and assignment
operators to make this explicit.
@rwgk
Copy link
Collaborator

rwgk commented Jul 29, 2025

Hi @b-pass, I added two minor commits, by-products of me trying to understand this change. Please feel free to undo what doesn't make sense to you.

Regarding the GraalPy failures, could you please look here?

https://chatgpt.com/share/688854b1-8b00-8008-8d34-9d3f0d77926a

Warning: There is quite a bit of weirdness in that conversation. You might want to ignore most of it.

However, could you please look for

Option A: Skip TLS destruction if Python is finalized

There are also Option B and Option C. Do any of those have merit?

Another idea, for poking around until we hopefully hit on something:

--- a/include/pybind11/detail/internals.h
+++ b/include/pybind11/detail/internals.h
@@ -98,7 +98,9 @@ public:
         // Neither of those have anything to do with CPython internals. PyMem_RawFree *requires*
         // that the `key` be allocated with the CPython allocator (as it is by
         // PyThread_tss_create).
+#if !defined(GRAALVM_PYTHON)
         PYBIND11_TLS_FREE(key_);
+#endif
     }

     thread_specific_storage(thread_specific_storage const &) = delete;

Oh, the CI finished just now, with 3 graalpy failures, and one pesky unrelated failure that we can ignore (🐍 (macos-13, 3.13t, -DCMAKE_CXX_STANDARD=11) / 🧪).

I'll try that diff now to see what happens.

@rwgk
Copy link
Collaborator

rwgk commented Jul 29, 2025

Wow: Everything passes after adding 443a2e5.

@b-pass @msimacek Do you have ideas how to handle this properly?

@msimacek
Copy link
Contributor

In GraalPy, the TSS is implemented in Java, so it needs the JVM to be somewhat alive to do these calls. I think the proper way would be to check Py_IsInitialized() before calling PYBIND11_TLS_FREE in the destructor.

// However, in GraalPy (as of v24.2 or older), TSS is implemented by Java and this call
// requires a living Python interpreter.
#ifdef GRAALVM_PYTHON
if (!Py_IsInitialized()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From old battles like here I remembered:

if (!Py_IsInitialized() || Py_Finalizing()) {

When asked about this, ChatGPT suggested:

Why Py_IsInitialized() may not be sufficient

Py_IsInitialized() only checks whether the global interpreter state pointer is non-null.

During finalization, Py_IsInitialized() will remain true until very late, even after large parts of the runtime are already torn down.

Accessing some Python APIs during this late stage (especially in alternative runtimes like GraalPy) can trigger undefined behavior or crashes.

WDYT?

@rwgk
Copy link
Collaborator

rwgk commented Jul 31, 2025

@colesbury Could you please help with a review of this (small) PR? It touches code you added with PR #5148 (now here).

@rwgk rwgk merged commit 780ec11 into pybind:master Aug 2, 2025
82 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Aug 2, 2025
@henryiii henryiii changed the title Make implicitly_convertable sub-interpreter and free-threading safe fix: make implicitly_convertable sub-interpreter and free-threading safe Aug 21, 2025
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants