Skip to content

BUG: GH29310 HDF file compression not working #29404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

yp1996
Copy link

@yp1996 yp1996 commented Nov 4, 2019

Re #29310, the complib and complevel parameters were not being passed down all the way previously, hence HDF compression not working.

I noticed that the implementation of to_hdf() specifies that compression is not allowed for fixed formats:
if not s.is_table and complib: raise ValueError("Compression not supported on Fixed format stores")

I'm guessing that means the performance comparison section for https://github.com/pandas-dev/pandas/pull/28890/files will also need to be updated to remove the test_fixed_compress test @WuraolaOyewusi?

Also, after the update, the following test is currently failing:
image

due to a ValueError for using compression with a fixed format, and I'm not sure as to why the expected behaviour for this test is what it is? Why should setting complib disable compression? I would appreciate any further info on that.

@alimcmaster1 alimcmaster1 added the IO HDF5 read_hdf, HDFStore label Nov 4, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only affect those two arguments? There are a few more documented on to_hdf:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_hdf.html

cc @TomAugspurger another one to maybe consider for io kwargs

@@ -784,7 +784,7 @@ def test_complibs(self, setup_path):
gname = "foo"

# Write and read file to see if data is consistent
df.to_hdf(tmpfile, gname, complib=lib, complevel=lvl)
df.to_hdf(tmpfile, gname, complib=lib, complevel=lvl, format="table")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a dedicated test for the change you are making instead of just modifying this?

@@ -943,6 +956,17 @@ def put(self, key, value, format=None, append=False, **kwargs):
append : bool, default False
This will force Table format, append the input data to the
existing.
complevel : int, 0-9, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you align the docstring with the signature? Looks like compilb and complevel are swapped

value,
format=None,
append=False,
complib=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add annotations for new parameters?

@pep8speaks
Copy link

pep8speaks commented Nov 5, 2019

Hello @yp1996! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 663:24: E231 missing whitespace after ','
Line 668:24: E231 missing whitespace after ','

Line 20:39: E231 missing whitespace after ','
Line 23:39: E231 missing whitespace after ','
Line 93:42: E231 missing whitespace after ','

Comment last updated at 2020-01-22 06:08:45 UTC

@WuraolaOyewusi
Copy link
Contributor

Re #29310, the complib and complevel parameters were not being passed down all the way previously, hence HDF compression not working.

I noticed that the implementation of to_hdf() specifies that compression is not allowed for fixed formats:
if not s.is_table and complib: raise ValueError("Compression not supported on Fixed format stores")

I'm guessing that means the performance comparison section for https://github.com/pandas-dev/pandas/pull/28890/files will also need to be updated to remove the test_fixed_compress test @WuraolaOyewusi?

Also, after the update, the following test is currently failing:
image

due to a ValueError for using compression with a fixed format, and I'm not sure as to why the expected behaviour for this test is what it is? Why should setting complib disable compression? I would appreciate any further info on that.

I think rather than silently refuse compression of fixed format, It should raise an error instead.

@jbrockmendel
Copy link
Member

pls rebase

@jbrockmendel
Copy link
Member

rebased, fixing a couple of annotations foo: str = None --> foo: Optional[str] = None

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

@yp1996 can you merge master

@MarcoGorelli
Copy link
Member

HI @yp1996 - sorry to chase you up, just wanted to ask whether you're still working on this :)
Seems that there's some test failures, but I can't check them as it says "build not found"

@yp1996
Copy link
Author

yp1996 commented Jan 22, 2020

Hey @jreback @MarcoGorell, sorry i'd gone mia - i've merged with master but the build is still failing, please feel free to look into it!

@MarcoGorelli
Copy link
Member

the build is still failing, please feel free to look into it!

For a start, there's some linting issues - did you run black against the .py files you changed?

@MarcoGorelli
Copy link
Member

Closing as it looks like this has gone stale and @joybhallaa has expressed interest in working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HDF file compression not working
8 participants