Skip to content

CI: test for file leakage #30162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 11, 2019
Merged

CI: test for file leakage #30162

merged 11 commits into from
Dec 11, 2019

Conversation

jbrockmendel
Copy link
Member

Continue troubleshooting followning #30096.

Locally I'm seeing some failures sqlite files are leaking, but im not seeing the xls files that are causing problems in the CI.

@jbrockmendel
Copy link
Member Author

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. I think worth adding a whatsnew as well

@jbrockmendel
Copy link
Member Author

I think worth adding a whatsnew as well

Sure. Also worth questioning if we want to just bump the dependency. I know 3.0.2 has caused some problems; if we're going to change it, might as well do it in one go.

@jbrockmendel
Copy link
Member Author

Looks like the sqlite failures are in test_memory_map and test_repr_html_ipython_config. the test_memory_map one is fixed by running gc.collect at the end of it, but i dont have a clear fix for the ipython one. Suggestions?

@jreback jreback added the CI Continuous Integration label Dec 10, 2019
@jreback jreback added this to the 1.0 milestone Dec 10, 2019
from distutils.version import LooseVersion

if LooseVersion(openpyxl.__version__) <= LooseVersion("2.5.5"):
import gc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really this works? this is very weird to do inside a __del__.

reading the link I would rather so something like (and only in the openpyxl engine)

if LooseVersion(openpyxl.__version__) <= LooseVersion("2.5.5"):
     for s in self.sheets.values:
            s.xml_source.close()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe need a try/except around the close

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does work, agreed that calling gc is ugly, will try the xml_source.close thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no dice, no sheets attribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right i am not sure of the correct attribute here
but shoulda be clear if step into it

just really want to avoid the gc call in del

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd any thoughts on how to make this work without gc call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this cleanup to a test fixture used for the relevant tests? And just ensure that it's called for the relevant openpyxl version before check_for_file_leaks is run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess as a general comment can we move this to the _openpyxl instead of checking the engine in the base class? Or does that not work?

I think you can just use the get_sheet_by_index property to iterate over the sheets and close the related xml_source

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ignore comment before on moving to _openpyxl - I see this is for the ExcelFile class and not the Reader

I think you could do something like:

sheet_count = len(self._reader.sheet_names)
for i in range(sheet_count):
    sheet = self._reader.get_sheet_by_index(i)
    sheet.xml_source.close()

IIUC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sheet.xml_source.close()

still no dice. I'm sure there is some amount of digging around that can effectively re-implement the fix that openpyxl did, but im not wild about digging into openpyxl internals, especially for a known-fixed bug

@jbrockmendel
Copy link
Member Author

Can we move this cleanup to a test fixture used for the relevant tests? And just ensure that it's called for the relevant openpyxl version before check_for_file_leaks is run?

@TomAugspurger not clear on what you're asking for. Are you looking to execute the fixture for fewer tests (to limit overhead)? or to make the gc call in the relevant tests instead of in the library code?

@TomAugspurger
Copy link
Contributor

or to make the gc call in the relevant tests instead of in the library code?

That. IIUC, the primary motivator here is tracing down file leaks between tests, rather than patching the behavior of old versions of openpyxl.

@jbrockmendel
Copy link
Member Author

the primary motivator here is tracing down file leaks between tests, rather than patching the behavior of old versions of openpyxl.

I like this line of reasoning.

@jbrockmendel
Copy link
Member Author

The min version we have listed for openpyxl is 2.4.8, which was released 2017-05-30. The earliest version that fixed this bug was 2.5.6, released 2018-08-30. I think implementing Tom's suggestion and bumping the version (separate PR) would be reasonable here

@jbrockmendel
Copy link
Member Author

huh, putting the gc.collect in the fixture doesn't seem to get the job done. putting it at the end of each test seems hit-and-miss.

@jreback
Copy link
Contributor

jreback commented Dec 10, 2019

The min version we have listed for openpyxl is 2.4.8, which was released 2017-05-30. The earliest version that fixed this bug was 2.5.6, released 2018-08-30. I think implementing Tom's suggestion and bumping the version (separate PR) would be reasonable here

I would be +1 on just bumping and call it a day :->

@jreback jreback merged commit dafbee7 into pandas-dev:master Dec 11, 2019
@jreback
Copy link
Contributor

jreback commented Dec 11, 2019

thanks @jbrockmendel

btw not sure if mention openpyxl version in install.rst, if so can you update it

@jbrockmendel jbrockmendel deleted the cifix7 branch December 11, 2019 17:07
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants