Skip to content

gh-75459: Doc: C API: Improve object life cycle documentation #125962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

rhansen
Copy link
Contributor

@rhansen rhansen commented Oct 25, 2024

  • Add "cyclic isolate" to the glossary.
  • Add a new "Object Life Cycle" page.
    • Illustrate the order of life cycle functions.
    • Document PyObject_CallFinalizer and PyObject_CallFinalizerFromDealloc.
  • PyObject_Init does not call tp_init.
  • PyObject_New:
    • also initializes the memory
    • does not call tp_alloc, tp_new, or tp_init
    • should not be used for GC-enabled objects
    • memory must be freed by PyObject_Free
  • PyObject_GC_New memory must be freed by PyObject_GC_Del.
  • Warn that garbage collector functions can be called from any thread.
  • tp_finalize and tp_clear:
    • Only called when there's a cyclic isolate.
    • Only one object in the cyclic isolate is finalized/cleared at a time.
    • Clearly warn that they might not be called.
    • They can optionally be manually called from tp_dealloc (via PyObject_CallFinalizerFromDealloc in the case of tp_finalize).
  • tp_finalize:
    • Reference object.__del__.
    • The finalizer can resurrect the object.
    • Suggest PyErr_GetRaisedException and PyErr_SetRaisedException instead of the deprecated PyErr_Fetch and PyErr_Restore functions.
    • Add links to PyErr_GetRaisedException and PyErr_SetRaisedException.
    • Suggest using PyErr_WriteUnraisable if an exception is raised during finalization.
    • Rename the example function from local_finalize to foo_finalize for consistency with the tp_dealloc documentation and as a hint that the name isn't special.
    • Minor wording and sylistic tweaks.
    • Warn that tp_finalize can be called during shutdown.

📚 Documentation preview 📚: https://cpython-previews--125962.org.readthedocs.build/

@rhansen
Copy link
Contributor Author

rhansen commented Oct 25, 2024

I'm not familiar enough with CPython's internals to be super confident about these changes. I would appreciate it if a GC expert would carefully review this.

Thanks!

@hugovk
Copy link
Member

hugovk commented Oct 25, 2024

Hmm, I'm not sure if we can require graphviz for the docs.

We'd have to consider installing it on the main docs server in addition to Read the Docs, and also make sure the docs can still build without it, for downstream redistributors who might only want to build with "vanilla" Sphinx and no extra extensions. Plus other developers would need an easy way to build the docs on their machines.

cc @AA-Turner

@rhansen rhansen force-pushed the docs branch 4 times, most recently from be22691 to d804c6c Compare October 25, 2024 09:10
@rhansen
Copy link
Contributor Author

rhansen commented Oct 25, 2024

Hmm, I'm not sure if we can require graphviz for the docs.

Maybe I should just commit the generated .svg (and the input dot file so it can be revised easily). Would that be acceptable?

  * Add "cyclic isolate" to the glossary.
  * Add a new "Object Life Cycle" page.
    * Illustrate the order of life cycle functions.
    * Document `PyObject_CallFinalizer` and
      `PyObject_CallFinalizerFromDealloc`.
  * `PyObject_Init` does not call `tp_init`.
  * `PyObject_New`:
    * also initializes the memory
    * does not call `tp_alloc`, `tp_new`, or `tp_init`
    * should not be used for GC-enabled objects
    * memory must be freed by `PyObject_Free`
  * `PyObject_GC_New` memory must be freed by `PyObject_GC_Del`.
  * Warn that garbage collector functions can be called from any
    thread.
  * `tp_finalize` and `tp_clear`:
    * Only called when there's a cyclic isolate.
    * Only one object in the cyclic isolate is finalized/cleared at a
      time.
    * Clearly warn that they might not be called.
    * They can optionally be manually called from `tp_dealloc` (via
      `PyObject_CallFinalizerFromDealloc` in the case of
      `tp_finalize`).
  * `tp_finalize`:
    * Reference `object.__del__`.
    * The finalizer can resurrect the object.
    * Suggest `PyErr_GetRaisedException` and
      `PyErr_SetRaisedException` instead of the deprecated
      `PyErr_Fetch` and `PyErr_Restore` functions.
    * Add links to `PyErr_GetRaisedException` and
      `PyErr_SetRaisedException`.
    * Suggest using `PyErr_WriteUnraisable` if an exception is raised
      during finalization.
    * Rename the example function from `local_finalize` to
      `foo_finalize` for consistency with the `tp_dealloc`
      documentation and as a hint that the name isn't special.
    * Minor wording and sylistic tweaks.
    * Warn that `tp_finalize` can be called during shutdown.
@rhansen
Copy link
Contributor Author

rhansen commented Oct 25, 2024

I committed the generated .svg so that the substance of this PR can be reviewed while we figure out if it is acceptable to add graphviz as a dependency. (Note that sphinx.ext.graphviz is a built-in extension, so enabling it doesn't add any new sphinx dependencies.)

@AA-Turner
Copy link
Member

I think that requiring graphviz should be fine -- Debian, Fedora, Gentoo, and OpenSUSE all package it. As Richard notes, it's a built-in extension, so should be fine from the "Vanilla" perspective.

I would want to include a NEWS entry to say that graphviz is now required to build the docs, though.

A

@AA-Turner AA-Turner added the docs Documentation in the Doc dir label Oct 25, 2024
@ZeroIntensity
Copy link
Member

My main concern with documenting nitty-gritty details of the lifecycle is that we're technically documenting implementation details, which are subject to change (and we've been bad at updating these kind of things from version-to-version in past). I suggest the SVG go into the InternalDocs folder instead.

It's also worth noting here that tp_finalize isn't 100% related to garbage collection, it's supposed to be used over tp_dealloc if complicated things are being done upon finalization, even for non-GC types. And while we're here, I think it would be a good idea to document the cases that tp_clear should exist for a tracked type.

@rhansen
Copy link
Contributor Author

rhansen commented Oct 26, 2024

My main concern with documenting nitty-gritty details of the lifecycle is that we're technically documenting implementation details, which are subject to change (and we've been bad at updating these kind of things from version-to-version in past). I suggest the SVG go into the InternalDocs folder instead.

I don't want to document any implementation details here, so I'm happy to remove what isn't necessary. It's hard to tell what is and isn't necessary because the end of an object's life is especially fraught with peril. I think that it is better to err on the side of over-documenting this topic than under-documenting.

I wrote this PR because there were several things that I needed to know that the existing documentation didn't make clear:

  • The order the tp_* functions might be called (to know what invariants are possible).
  • Which threads might execute the functions (for locking correctness).
  • Details about when a function is called (or not) that are necessary for locking correctness. (e.g., if multiple objects in the same cyclic isolate are never finalized concurrently then a lock-free design might be possible)
  • Approximately how often a tp_* function might be called: maybe never, exactly once, at most once, at least once, very frequently, etc.
  • Which other objects might be in an inconsistent state.

It's also worth noting here that tp_finalize isn't 100% related to garbage collection, it's supposed to be used over tp_dealloc if complicated things are being done upon finalization, even for non-GC types.

If I understand correctly, tp_finalize is never called for non-GC types unless the class designer calls it from tp_dealloc. In that case tp_finalize is just like any other helper function that might be called from tp_dealloc. (Maybe this is only true for static types and not heap types? I don't fully understand the difference.)

And while we're here, I think it would be a good idea to document the cases that tp_clear should exist for a tracked type.

I thought that was already sufficiently explained, even before this PR. Can you explain what you think is lacking?

@ZeroIntensity
Copy link
Member

First, thanks for doing this!

I think that it is better to err on the side of over-documenting this topic than under-documenting.

I don't, that limits our ability to modify the lifecycle in the future (especially because there's no good way to deprecate things here). I'll point this out when doing a more in-depth review though, I don't see anything particularly bad right now.

If I understand correctly, tp_finalize is never called for non-GC types unless the class designer calls it from tp_dealloc.

You're right, it's not, but I don't think we should limit ourselves to that in the future. It might be possible someday to automatically do this for untracked types as well. We should just document that all types, even GC, require PyObject_CallFinalizerFromDealloc in the destructor if they want tp_finalize to get eventually called--we can note that it could happen automatically, though.

I thought that was already sufficiently explained, even before this PR. Can you explain what you think is lacking?

Basically, it's not documented which types need to have a tp_clear, because not all GC types have it. I'm not even sure which cases require it. I think it's only needed if the type can have a direct reference cycle to itself? (As in, running its finalizer will try to Py_DECREF itself.)

Also, I don't think it should be documented that tp_clear is related to tp_dealloc by making an "optional call", that's sort of incidental. They tend to do the same thing, and the destructor can utilize the clear function for convenience, but they're for different purposes.


A few other notes:

  • It's fine to document what the specific allocators (e.g. PyObject_GC_New and PyObject_GC_Del) do, but we should point users to using tp_alloc and tp_free instead.
  • That said, I don't see the need to mention that functions like PyObject_New don't call tp_init. Those are strictly allocators, and currently documented as such.
  • This isn't exhaustive--static objects and some immortal objects don't follow this lifecycle (single-phase modules don't either, I think). Other objects might not follow this in the future either.
  • If I read "called from any thread" in the docs, I would be worried about holding the GIL. Maybe mention that while it can be called from any thread, it will still hold the GIL.

@rhansen
Copy link
Contributor Author

rhansen commented Nov 7, 2024

I did a major rewrite to hopefully address all of the feedback (thanks for reviewing!). Please take another look.

@rhansen rhansen marked this pull request as ready for review December 1, 2024 10:20
@ZeroIntensity
Copy link
Member

I'll start the second pass today, but it will definitely take a few days for me to finish the whole review.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

I've not been able to go through the entire PR, but I'm sending the comments I already wrote. I hope they make sense.

I'll make another pass later.

Comment on lines 105 to 108
*resurrected*, preventing its pending destruction. (Only
:c:member:`!tp_finalize` is allowed to resurrect an object;
:c:member:`~PyTypeObject.tp_clear` and
:c:member:`~PyTypeObject.tp_dealloc` cannot.) Resurrecting an object may
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make the text contradictory: if tp_finalize may resurrect an object but tp_dealloc may not, then tp_dealloc may call PyObject_CallFinalizerFromDealloc (which calls tp_finalize).

AFAIK, any of these may resurrect the object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think tp_clear can resurrect, can it? tp_clear and tp_dealloc should be resurrecting via the finalizer, but if tp_clear is called, then tp_finalize should have been called by the GC anyway.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for dropping the ball on my part--we're getting to a point where I'm willing to put the first checkmark on it!

I've noticed that a lot of information is repeated. It's good to be clear, but redudancy can be a bit frustrating on our end, because we need to update more places if we want to change something.

@ZeroIntensity
Copy link
Member

@rhansen Are you still planning on working on this? If not, I'm happy to take over.

@encukou
Copy link
Member

encukou commented Mar 11, 2025

Updating to resolve a conflict with GH-129850.

IMO, the warnings about PyObject_New[Var] not being usable with Py_TPFLAGS_HAVE_GC should have at least as strong as the ones about not initializing memory, but, I posted that out before. I didn't want to start reorganizing the PR.

@rhansen, do you have any work in progress on this?
This is stuff that can burn a person out; don't feel bad if you want to delegate it.

@ZeroIntensity
Copy link
Member

I'm going to be taking this over. I've addressed my own comments and most of Petr's. Other reviews would be appreciated!

@encukou
Copy link
Member

encukou commented May 5, 2025

Thank you! I definitely want to take another look -- after Beta 1, next week or so.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is an improvement. Thank you for the work!
Let's do one more pass, merge, and leave the rest for a future PR?

@ZeroIntensity
Copy link
Member

Let's do one more pass, merge, and leave the rest for a future PR?

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review docs Documentation in the Doc dir skip news
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

6 participants