-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CI: test for file leakage #30162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: test for file leakage #30162
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find. I think worth adding a whatsnew as well
Sure. Also worth questioning if we want to just bump the dependency. I know 3.0.2 has caused some problems; if we're going to change it, might as well do it in one go. |
Looks like the sqlite failures are in test_memory_map and test_repr_html_ipython_config. the test_memory_map one is fixed by running gc.collect at the end of it, but i dont have a clear fix for the ipython one. Suggestions? |
pandas/io/excel/_base.py
Outdated
from distutils.version import LooseVersion | ||
|
||
if LooseVersion(openpyxl.__version__) <= LooseVersion("2.5.5"): | ||
import gc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really this works? this is very weird to do inside a __del__
.
reading the link I would rather so something like (and only in the openpyxl engine)
if LooseVersion(openpyxl.__version__) <= LooseVersion("2.5.5"):
for s in self.sheets.values:
s.xml_source.close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe need a try/except around the close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does work, agreed that calling gc is ugly, will try the xml_source.close thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no dice, no sheets
attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right i am not sure of the correct attribute here
but shoulda be clear if step into it
just really want to avoid the gc call in del
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd any thoughts on how to make this work without gc call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this cleanup to a test fixture used for the relevant tests? And just ensure that it's called for the relevant openpyxl version before check_for_file_leaks
is run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess as a general comment can we move this to the _openpyxl
instead of checking the engine in the base class? Or does that not work?
I think you can just use the get_sheet_by_index
property to iterate over the sheets and close the related xml_source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ignore comment before on moving to _openpyxl
- I see this is for the ExcelFile
class and not the Reader
I think you could do something like:
sheet_count = len(self._reader.sheet_names)
for i in range(sheet_count):
sheet = self._reader.get_sheet_by_index(i)
sheet.xml_source.close()
IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sheet.xml_source.close()
still no dice. I'm sure there is some amount of digging around that can effectively re-implement the fix that openpyxl did, but im not wild about digging into openpyxl internals, especially for a known-fixed bug
@TomAugspurger not clear on what you're asking for. Are you looking to execute the fixture for fewer tests (to limit overhead)? or to make the gc call in the relevant tests instead of in the library code? |
That. IIUC, the primary motivator here is tracing down file leaks between tests, rather than patching the behavior of old versions of openpyxl. |
I like this line of reasoning. |
The min version we have listed for openpyxl is 2.4.8, which was released 2017-05-30. The earliest version that fixed this bug was 2.5.6, released 2018-08-30. I think implementing Tom's suggestion and bumping the version (separate PR) would be reasonable here |
huh, putting the gc.collect in the fixture doesn't seem to get the job done. putting it at the end of each test seems hit-and-miss. |
I would be +1 on just bumping and call it a day :-> |
thanks @jbrockmendel btw not sure if mention openpyxl version in install.rst, if so can you update it |
Continue troubleshooting followning #30096.
Locally I'm seeing some failures sqlite files are leaking, but im not seeing the xls files that are causing problems in the CI.