Skip to content

GH-62052: add defer_close logic to FileIO #105171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented May 31, 2023

If the FileIO object is wrapped by a write buffer, set the defer_close property to true. That ensures the underlying FileIO object is not closed until the buffer is closed (and can flush the unwritten data). This avoids data loss, as demoed by buffer_not_flushed.py. E.g. if both FileIO and the BufferedWriter are both part of the reference cycle, the finalize triggered by the GC can close the FileIO object before BufferedWriter has a chance to write the data out.

This is a proof-of-concept at this point, not ready to merge.

If the file is wrapped by a write buffer, set defer_close to true.  That
ensures the underlying FileIO object is not closed until the buffer is
closed (and can flush the unwritten data).  This avoids data loss, as
demoed by buffer_not_flushed.py.
@nascheme
Copy link
Member Author

nascheme commented Jun 1, 2023

Some further thoughts about this:

  • defer_close should be a counter, rather than a bool. We would need to keep track of the number of write buffers using the FileIO object. Only close it when the (unflushed?) buffer count reaches zero.
  • checking buffer count in the internal_close function is probably not the right thing. If someone calls FileIO.close() then we could just close and possibly lose data. If count > 0 can be checked on GC finalize, I think that would be sufficient to fix the Built-in module _io can lose data from buffered files in reference cycles #62052 bug.
  • perhaps if count > 0 and close is called, the FileIO should remember that. Then when count reaches zero the file is automatically closed at that time. Essentially, remember that close was called but don't do it until buffers have a chance to flush.
  • the buffer count is a bit like a special kind of reference count, keeping track of # of write buffers using the FileIO object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants