Skip to content

C API: Design an API to disallow creating more than one extension instance per Python process #111088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vstinner opened this issue Oct 19, 2023 · 3 comments

Comments

@vstinner
Copy link
Member

Somewhere in PyModuleDef, we should add a way to disallow globally the creation of more than one extension instance.

Or maybe the restriction should be that an extension cannot be loaded in two different interpreters. Honestly, I'm not sure.

@ericsnowcurrently no longer wants to treat the main interpreter differently: see issue gh-109857.

Previously, I proposed to directly expose PyInterpreterState_IsMain() to the limited C API, but the idea got rejected.


The syslog module behaves differently if it's used in the main interpreter:

static inline int
is_main_interpreter(void)
{
    return (PyInterpreterState_Get() == PyInterpreterState_Main());
}

...
    if (!is_main_interpreter()) {
        PyErr_SetString(PyExc_RuntimeError, "subinterpreter can't use syslog.openlog()");
        return NULL;
    }

The syslog module stores a Python object in static PyObject *S_ident_o = NULL;. It's a way to keep a string passed to C openlog() function alive until closelog() is called. The problem is to decide which extension instance and/or which Python interpreter is responsible for that memory: the memory must be kept alive until closelog() is called. But what if each Python interpreter call openlog()? Calling openlog() more than once overrides the previous call. But if closelog() is called, if multiple interpreters called openlog(), you might think that syslog is still usable, whereas closelog() has been called.

See issue gh-99127 and PR gh-99128 which added is_main_interpreter() function to syslog.


Cython already implements a similar, but different check: it raises "this module can only be loaded into one interpreter per process" error message.

Cython uses a different logic to handle PEP 489 "Multi-phase extension module initialization". It uses a static PY_INT64_T main_interpreter_id = -1; variable which is initialized at the first call of __Pyx_check_single_interpreter().

//#if CYTHON_PEP489_MULTI_PHASE_INIT
static CYTHON_SMALL_CODE int __Pyx_check_single_interpreter(void) {
    #if PY_VERSION_HEX >= 0x030700A1
    static PY_INT64_T main_interpreter_id = -1;
    PY_INT64_T current_id = PyInterpreterState_GetID(PyThreadState_Get()->interp);
    if (main_interpreter_id == -1) {
        main_interpreter_id = current_id;
        return (unlikely(current_id == -1)) ? -1 : 0;
    } else if (unlikely(main_interpreter_id != current_id))

    #else
    static PyInterpreterState *main_interpreter = NULL;
    PyInterpreterState *current_interpreter = PyThreadState_Get()->interp;
    if (!main_interpreter) {
        main_interpreter = current_interpreter;
    } else if (unlikely(main_interpreter != current_interpreter))
    #endif

    {
        PyErr_SetString(
            PyExc_ImportError,
            "Interpreter change detected - this module can only be loaded into one interpreter per process.");
        return -1;
    }
    return 0;
}
@vstinner
Copy link
Member Author

Calling some C functions have a process-wide effect (ex: not limited to the current thread), and some also expect that the caller keeps the memory alive.

In the past, os.putenv() was implemented with C putenv() which expects the caller to keep the memory alive. Python had to store the string in an internal list (posix_putenv_garbage) until Python exit. Python started to crash when the os module was converted to multi-phase init API (PEP 489).

See:

@encukou
Copy link
Member

encukou commented Oct 23, 2023

The mechanism is currently documented as a recipe in the howto: https://docs.python.org/3/howto/isolating-extensions.html#opt-out-limiting-to-one-module-object-per-process

Or maybe the restriction should be that an extension cannot be loaded in two different interpreters. Honestly, I'm not sure.

IMO, this depends on the extension.
The recipe is relatively easy to extend in this way -- reset the flag when freeing the module. (That's the power of recipes -- they spell out what's happening, and you can adjust them.)

The downside of this recipe is that it will need locking under nogil. (But it seems to me that it's not the only thing to need that, and so nogil should add a general mechanism for process-global locking.)

@ericsnowcurrently
Copy link
Member

nogil should add a general mechanism for process-global locking

FWIW, this could be helpful for some extensions even before nogil. See the previous DPO discussion (and another related DPO discussion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants