Skip to content

Conversation

@adimascio
Copy link

@adimascio adimascio commented Sep 13, 2019

NOTE: I'm opening the PR mainly to open a discussion about the current behaviour. If this is intended, please tell me. If not and you need me to write some tests, I'll do it :)

In read_hdf, if the store.select() call throws either a
ValueError, a TypeError or a KeyError then the store is closed.

However, if any other exception is thrown (e.g. an AssertionError if
there are gaps in blocks ref_loc) , the store is not closed and some
client code could end up trying to reopen the store and hit an
error like: the file XXX is already opened. Please close it before reopening in write mode. Furthermore, the exception is re-raised in all
cases.

This commit just catches any exception.

Closes #28430

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@WillAyd WillAyd added the IO HDF5 read_hdf, HDFStore label Sep 13, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test and a whatsnew for v1.0.0 when you get the chance?

@adimascio
Copy link
Author

Can you add a test and a whatsnew for v1.0.0 when you get the chance?

Sure, I will do that in the next days

@adimascio adimascio force-pushed the fix/hdf-close-on-error branch 2 times, most recently from cc7bbd2 to f96d66b Compare September 13, 2019 21:33
@adimascio adimascio requested a review from WillAyd September 13, 2019 21:34
@adimascio
Copy link
Author

Can you add a test and a whatsnew for v1.0.0 when you get the chance?

Sure, I will do that in the next days

I've just re-requested your review for a first version of the PR.

@adimascio adimascio force-pushed the fix/hdf-close-on-error branch from f96d66b to 3160972 Compare September 15, 2019 10:11
@adimascio
Copy link
Author

CI is failing… It looks like some exceptions are exceptions are expected not to close the HDF store... Should we explicitly whitelist these exceptions or should we do the opposite, i.e. list all exceptions we should close the store on? In my specific case, I expected the store to be closed on AssertionErrors.

@TomAugspurger
Copy link
Contributor

Which tests are failing because of that? Something like test_read_hdf_open_store?

I also would have expected an exception to close an auto-opened store.

@adimascio
Copy link
Author

Which tests are failing because of that? Something like test_read_hdf_open_store?

The failing tests are:

  • test_select_iterator
  • test_read_hdf_iterator
  • test_read_hdf_open_store

I also would have expected an exception to close an auto-opened store.

I didn't have the time to check for now, probably something going on with the iteration protocol. I will try to do that soon.

@jreback
Copy link
Contributor

jreback commented Sep 18, 2019

In read_hdf, if the store.select() call throws either a
ValueError, a TypeError or a KeyError then the store is closed.

However, if any other exception is thrown (e.g. an AssertionError if
there are gaps in blocks ref_loc) , the store is not closed and some
client code could end up trying to reopen the store and hit an
error like: the file XXX is already opened. Please close it before reopening in write mode. Furthermore, the exception is re-raised in all
cases.

we would want to close the store on all exceptions, likely this is an oversite.

try:
HDFStore(path, mode="w")
except ValueError as exc:
pytest.fail(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to catch something that you are expecting to pass

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jreback for the feedback. I'm not exactly sure about what you meant but if you suggest to remove the except ValueError, that was done on purpose in order to fail (F) rather than to output an error (E).

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Sep 18, 2019
@adimascio
Copy link
Author

we would want to close the store on all exceptions, likely this is an oversite.

@jreback ok, thanks! Should I go and fix the other tests accordingly then?

@TomAugspurger
Copy link
Contributor

I think the tests can be updated where needed.

One question: if a user does something like

store = pd.HDFStore(...)
store.<something that causes an exception>

then will the store be closed? Or do we only close stores that we open? Ideally we would only ensure we close stores that we open.

@jreback
Copy link
Contributor

jreback commented Oct 3, 2019

this PR is slightly orthogonal to #28700 so if you want to rebase after we merge that would be great.

@WillAyd
Copy link
Member

WillAyd commented Oct 10, 2019

@adimascio can you merge master on this?

@adimascio adimascio force-pushed the fix/hdf-close-on-error branch from 3160972 to 2f2d8a9 Compare October 10, 2019 20:23
@adimascio
Copy link
Author

@adimascio can you merge master on this?

@WillAyd @jreback I've just rebased on master. I will try to fix the tests as discussed earlier this week-end!

@WillAyd
Copy link
Member

WillAyd commented Nov 7, 2019

@adimascio can you fix the merge conflict on the whatsnew and see if you can get this green?

1 similar comment
@jbrockmendel
Copy link
Member

@adimascio can you fix the merge conflict on the whatsnew and see if you can get this green?

@jbrockmendel
Copy link
Member

@adimascio can you fix the merge conflict and see if you can get the CI passing

In `read_hdf`, if the `store.select()` call throws either a
`ValueError`, a `TypeError` or a `KeyError` then the store is closed.

However, if any other exception is thrown (e.g. an `AssertionError` if
there are gaps in blocks ref_loc) , the store is not closed and some
client code could end up trying to reopen the store and hit an
error like: `the file XXX is already opened. Please close it before
reopening in write mode`. Furthermore, the exception is re-raised in all
cases.

This commit just closes store in all cases.

Closes pandas-dev#28430
@adimascio adimascio force-pushed the fix/hdf-close-on-error branch from 2f2d8a9 to 3dcf7f0 Compare December 4, 2019 08:18
@adimascio
Copy link
Author

@adimascio can you fix the merge conflict and see if you can get the CI passing

I've just push-forced to fix the conflicts. I'll try to fix the CI soon.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

@adimascio pls merge master and see if you can get passing

@WillAyd
Copy link
Member

WillAyd commented Feb 2, 2020

Closing as CI is red and I think this has gone stale, but ping if you’d like to pick this back up and can reopen

@WillAyd WillAyd closed this Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HDF store not closed on AssertionError

6 participants