-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: GH29310 HDF file compression not working #29404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this only affect those two arguments? There are a few more documented on to_hdf:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_hdf.html
cc @TomAugspurger another one to maybe consider for io kwargs
@@ -784,7 +784,7 @@ def test_complibs(self, setup_path): | |||
gname = "foo" | |||
|
|||
# Write and read file to see if data is consistent | |||
df.to_hdf(tmpfile, gname, complib=lib, complevel=lvl) | |||
df.to_hdf(tmpfile, gname, complib=lib, complevel=lvl, format="table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a dedicated test for the change you are making instead of just modifying this?
pandas/io/pytables.py
Outdated
@@ -943,6 +956,17 @@ def put(self, key, value, format=None, append=False, **kwargs): | |||
append : bool, default False | |||
This will force Table format, append the input data to the | |||
existing. | |||
complevel : int, 0-9, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you align the docstring with the signature? Looks like compilb
and complevel
are swapped
pandas/io/pytables.py
Outdated
value, | ||
format=None, | ||
append=False, | ||
complib=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add annotations for new parameters?
Hello @yp1996! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-01-22 06:08:45 UTC |
I think rather than silently refuse compression of fixed format, It should raise an error instead. |
pls rebase |
rebased, fixing a couple of annotations |
@yp1996 can you merge master |
HI @yp1996 - sorry to chase you up, just wanted to ask whether you're still working on this :) |
Hey @jreback @MarcoGorell, sorry i'd gone mia - i've merged with master but the build is still failing, please feel free to look into it! |
For a start, there's some linting issues - did you run |
Closing as it looks like this has gone stale and @joybhallaa has expressed interest in working on it |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Re #29310, the complib and complevel parameters were not being passed down all the way previously, hence HDF compression not working.
I noticed that the implementation of to_hdf() specifies that compression is not allowed for fixed formats:
if not s.is_table and complib: raise ValueError("Compression not supported on Fixed format stores")
I'm guessing that means the performance comparison section for https://github.com/pandas-dev/pandas/pull/28890/files will also need to be updated to remove the test_fixed_compress test @WuraolaOyewusi?
Also, after the update, the following test is currently failing:

due to a ValueError for using compression with a fixed format, and I'm not sure as to why the expected behaviour for this test is what it is? Why should setting complib disable compression? I would appreciate any further info on that.