Skip to content

Adding an unstable C API for unique references #133140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ZeroIntensity opened this issue Apr 29, 2025 · 15 comments
Closed

Adding an unstable C API for unique references #133140

ZeroIntensity opened this issue Apr 29, 2025 · 15 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API topic-free-threading type-feature A feature request or enhancement

Comments

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Apr 29, 2025

Feature or enhancement

Proposal:

In #132070, there was a race brought up in free-threading through use of Py_REFCNT(op) == 1. The fix is to use _PyObject_IsUniquelyReferenced.

There were a couple additional comments about how we should handle this for the public API (for example, @godlygeek suggested implementing Py_REFCNT as _PyObject_IsUniquelyReferenced(op) ? 1 : INT_MAX). I think the best approach is to just expose an unstable API for this and point to it in Py_REFCNT's documentation.

I'm not imagining anything complex:

int
PyUnstable_Object_IsUniquelyReferenced(PyObject *op)
{
    return _PyObject_IsUniquelyReferenced(op);
}

@encukou, do you mind if this skips the C API WG? I'm not sure there's much to discuss about the API, other than some light bikeshedding.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@ngoldbaum
Copy link
Contributor

I'm not sure if it's tracked elsewhere, but I'd also like to see an API function so NumPy can replace this code:

https://github.com/numpy/numpy/blob/d692fbccd98cb880812b32936e5f94fcfe55053f/numpy/_core/src/multiarray/temp_elide.c#L119-L152

We had to add it to fix NumPy on 3.14. See numpy/numpy#28681.

@Fidget-Spinner
Copy link
Member

@ngoldbaum can you please open an issue for that? I'll mark it as a deferred blocker so it lands before 3.14.

Sam told me it's temporary code so it's fine for now. Just pointing out that the _PyStackRef API is really not something that can be relied on. It will very likely break often till we settle on something that is nice internally for us.

@colesbury
Copy link
Contributor

Regarding the NumPy code, it can't use _PyObject_IsUniquelyReferenced (that won't fix the underlying problem). I agree with @Fidget-Spinner, we should open a new issue for it.

@ngoldbaum
Copy link
Contributor

see #133164

@ZeroIntensity
Copy link
Member Author

Actually, is there any use case for exposing _PyObject_IsUniquelyReferenced on its own if we're going to expose PyUnstable_Object_IsUniqueTemporary anyway? I'm not totally sure we need this one.

@colesbury
Copy link
Contributor

Yeah, cases like enum_next_long where you want to re-use (or modify) an object that you have a unique reference to. PyUnstable_Object_IsUniqueTemporary isn't useful here because the object isn't a temporary on the interpreter stack. _PyObject_IsUniquelyReferenced is appropriate because we already know that en->en_result holds a strong (heap) reference to the relevant object:

static PyObject *
enum_next_long(enumobject *en, PyObject* next_item)
{
PyObject *result = en->en_result;
PyObject *next_index;
PyObject *old_index;
PyObject *old_item;
Py_BEGIN_CRITICAL_SECTION(en);
next_index = increment_longindex_lock_held(en);
Py_END_CRITICAL_SECTION();
if (next_index == NULL) {
Py_DECREF(next_item);
return NULL;
}
if (_PyObject_IsUniquelyReferenced(result)) {
Py_INCREF(result);
old_index = PyTuple_GET_ITEM(result, 0);
old_item = PyTuple_GET_ITEM(result, 1);
PyTuple_SET_ITEM(result, 0, next_index);
PyTuple_SET_ITEM(result, 1, next_item);
Py_DECREF(old_index);
Py_DECREF(old_item);
// bpo-42536: The GC may have untracked this result tuple. Since we're
// recycling it, make sure it's tracked again:
_PyTuple_Recycle(result);
return result;
}
result = PyTuple_New(2);
if (result == NULL) {
Py_DECREF(next_index);
Py_DECREF(next_item);
return NULL;
}
PyTuple_SET_ITEM(result, 0, next_index);
PyTuple_SET_ITEM(result, 1, next_item);
return result;
}

@encukou
Copy link
Member

encukou commented Apr 30, 2025

Does this mean that the Py_REFCNT docs are again wrong in a backwards-incompatible way?

do not rely on the returned value to be accurate, other than a value of 0 or 1.


@ZeroIntensity

do you mind if this skips the C API WG?

IMO, that's generally OK for unstable API. Don't forget docs & tests though.

But if this was sent to the WG I'd ask to avoid documenting it in terms of reference counting, which is an implementation detail you shouldn't rely on (as recent developments like this and immortality show).
Would it be correct to say that if this returns true, “immutable” objects can be safely mutated -- that is, to "other" code it will appear as if the object was destroyed and a new one with the same id was recreated? (And if so, what's "other" code and how long does this apply?)

@ZeroIntensity
Copy link
Member Author

Does this mean that the Py_REFCNT docs are again wrong in a backwards-incompatible way?

Yes, but unless we want to revert PEP 703, I don't think there's any good way around that.

Would it be correct to say that if this returns true, “immutable” objects can be safely mutated -- that is, to "other" code it will appear as if the object was destroyed and a new one with the same id was recreated?

I think in-place updates would generally be checked by PyUnstable_Object_IsUniqueTemporary (gh-133170), not PyUnstable_Object_IsUniquelyReferenced. If I understand it correctly, the cases are:

  • If you keep the strong reference in C code (e.g., owning some sub-data structure in your object, such as a dict to dict keys), then you want to check IsUniquelyReferenced. You just want to make sure there's no other threads messing with your object at the time. (Interpreter borrows don't matter here, because if the interpreter has a reference, then the refcnt will above > 1 anyway.)
  • If you give the strong reference away to the interpreter (typically by returning it), then you need IsUniqueTemporary to check if the only reference is on the interpreter's stack in addition to being the only reference.

(@colesbury Fact check, please!)

But anyways, documenting this one in terms of reference counting would just make the API unclear. It acts as solely a replacement for Py_REFCNT(op) == 1, and since Py_REFCNT is already about reference counting, I think it's fine to document this one as such. The more important thing to me seems like documenting when to use both APIs.

@colesbury
Copy link
Contributor

If you keep the strong reference in C code ... then you want to check IsUniquelyReferenced

Yeah, that's right. I think a enum_next_long above is a good example. The iterator holds a strong reference via en->en_result, so it's sufficient to call PyUnstable_Object_IsUniquelyReferenced. From context, you know that nobody else can be holding a borrowed reference to that object. In psuedo code:

static PyObject *
iter_next(PyObject *self)
{
   MyIterObject *iter = (MyIterObject *)self;
   if (PyUnstable_Object_IsUniquelyReferenced(iter->it_result) {
     // optimization: reuse iter->it_result
     // update iter->it_result
     ...
     return Py_NewRef(iter->it_result);
   }
   else {
     // create a new result value
     return ...
   }
}

You need PyUnstable_Object_IsUniqueReferencedTemporary (recently renamed from PyUnstable_Object_IsUniqueTemporary) if you don't hold the strong reference. This is a stricter check than PyUnstable_Object_IsUniquelyReferenced. The typical use cases involve arguments to C functions, because the callee doesn't own the arguments. In pseudo-code:

static PyObject *
my_nb_add(PyObject *a, PyObject *b)
{
   if (PyUnstable_Object_IsUniqueReferencedTemporary(a)) {
      // optimization: modify a in-place and return it
   }
   else {
     // add `a` and `b` and return the result as a new object
   }
}

PyUnstable_Object_IsUniqueReferencedTemporary(op) implies PyUnstable_Object_IsUniquelyReferenced(op), but the reverse is not true.

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 1, 2025
@ncoghlan
Copy link
Contributor

ncoghlan commented May 2, 2025

Bikeshedding, but could the names be tightened up by making them about the reference rather than about the object being referenced? Something like:

  • PyUnstable_Ref_IsUnique: this reference is the sole reference to the object. It's up to the calling code to ensure the reference can't end up borrowed on the interpreter stack (that is, the ref count must be incremented when making the reference available to the interpreter stack).
  • PyUnstable_Ref_IsUniqueTemporary: this reference is the sole reference to the object and it's solely stored on the interpreter stack (and not borrowed). Always false for references that aren't owned by the interpreter stack (even if they're unique).

@encukou
Copy link
Member

encukou commented May 2, 2025

That would be introducing a PyRef group prefix (with Unstable_ spliced in, in this case).
That's some prime real estate, and one I'd like to use for a possible PEP-sized feature.
IMO, Object_ is fine.

@ronaldoussoren
Copy link
Contributor

Will this be in 3.14?

And is there a way to get the same behaviour in 3.13 where _PyObject_IsUniquelyReferenced is not exposed in headers?

@ZeroIntensity
Copy link
Member Author

Ideally, this one will make it into 3.14 too. I don't know if it's right to mark this as a blocker. cc @hugovk as the RM--do you want this in 3.14?

@hugovk
Copy link
Member

hugovk commented May 5, 2025

I don't think this new feature needs to go in 3.14. Having said that, I'd encourage the C API experts above to review the PR and we can include it if approved before tomorrow.

@vstinner
Copy link
Member

vstinner commented May 5, 2025

I merged the PR to land the function in Python 3.14. It's an unstable API, it can still change until Python 3.14 final, but I don't expect changes in its API, more on its documentation. I close the issue since the function is added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

10 participants