COMPAT: reading generic PyTables Table format fails with sub-selection #26818

jgehrcke · 2019-06-12T20:05:26Z

closes COMPAT: reading generic PyTables Table format fails with sub-selection #11188
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

simonjayhawkins · 2019-06-12T20:32:23Z

pandas/tests/io/test_pytables.py

+    (not written by pandas).
+    """
+
+    def _create_simple_hdf5_file_with_pytables(self):


is this necessary or could you just add a file in pandas\tests\io\data

is this necessary or could you just add a file in pandas\tests\io\data

Oh, that's some early feedback. Thank you! Much appreciated.

I think a proper test for the fix for #11188 should confirm that the minimal working example for reproducing the problem does not fail anymore.

I also think this is more readable (when the test itself shows how the data file was created, in contrast to committing a binary file of which it's not 100 % clear and discoverable of how it has been created).

Also, I'd like to think of this (using the PyTables API to create an HDF5 file) as an important aspect of this test: it effectively confirms a common way to use PyTables and pandas in tandem; PyTables for writing an HDF5 file, and pandas for reading it. These tests are not so much about the contents in the file, but about that a specific way to create the file matches a specific way to parse the file.

I'll try to make this a little more compact, though.

Well, let's see about the "fix", looking forward to your feedback there (commit upcoming). I think this will be more controversial.

pep8speaks · 2019-06-12T21:28:24Z

Hello @jgehrcke! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-21 13:33:33 UTC

codecov · 2019-06-12T21:44:25Z

Codecov Report

Merging #26818 into master will decrease coverage by 50.76%.
The diff coverage is 75%.

@@             Coverage Diff             @@
##           master   #26818       +/-   ##
===========================================
- Coverage   91.86%    41.1%   -50.77%     
===========================================
  Files         179      179               
  Lines       50707    50710        +3     
===========================================
- Hits        46583    20842    -25741     
- Misses       4124    29868    +25744

Flag	Coverage Δ
#multiple	`?`
#single	`41.1% <75%> (-0.1%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/pytables.py	`89.3% <75%> (-1%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/plotting/_matplotlib/__init__.py	`0% <0%> (-100%)`	⬇️
pandas/io/gcs.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/io/s3.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.37%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.16%)`	⬇️
... and 133 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 634577e...7f21dbe. Read the comment docs.

codecov · 2019-06-12T21:44:26Z

Codecov Report

Merging #26818 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26818      +/-   ##
==========================================
- Coverage   91.98%   91.98%   -0.01%     
==========================================
  Files         180      180              
  Lines       50772    50774       +2     
==========================================
- Hits        46704    46702       -2     
- Misses       4068     4072       +4

Flag	Coverage Δ
#multiple	`90.62% <100%> (+0.04%)`	⬆️
#single	`41.82% <100%> (-0.09%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/pytables.py	`90.3% <100%> (ø)`	⬆️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/frame.py	`96.89% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2243629...1ce1a70. Read the comment docs.

jgehrcke · 2019-06-12T21:47:06Z

Funny codecov bot :-). Well, let's wait for this round of CI to report back.

jgehrcke · 2019-06-12T21:58:38Z

@gfyoung @simonjayhawkins @jreback I would appreciate your feedback, especially about the "fix" itself for starters. The workaround is not necessarily nice. All tests pass, but could it be that this approach (introducing table._nrows_to_read = stop - start dynamically, and using it dynamically instead of table.nrows) has negative side effects? I want to say that I didn't quite see a meaningful way to pass start and stop explicitly down to GenericIndexCol; by the moment we talk about GenericIndexCol these two parameters are more table parameters than index col parameters. WDYT?

gfyoung · 2019-06-12T22:24:53Z

pandas/io/pytables.py

+            # Piggy-back normalized `nrows` (considering start and stop) onto
+            # PyTables tables object so that the GenericIndexCol constructor
+            # knows how large the index should be.
+            self.s.table._nrows_to_read = stop - start


start and stop - why not make these instance attributes of self ?

Or even just compute nrows as an attribute?

Huhu!

start and stop - why not make these instance attributes of self ?

Hm, what do you have in mind about doing with them after having set them as instance attributes of self? In the three lines below we're doing exactly that (we set self.start and self.stop; did you mean something else?)

Or even just compute nrows as an attribute?

Interesting. I thought about this a bit but I am not sure what you mean. Where; on the PyTables table object?

That would be quite invasive. A PyTables table object might use its nrows property internally.

Adding a new private attribute to an object (guaranteed to not mess with the object's intrinsics) is not beautiful, but a little less invasive than overwriting an object property which might mess with that object's internal consistency.

@jgehrcke : I'm talking about attaching a new attribute to the TableIterator instance itself.

Attaching attributes to the TableIterator's s.table is riskier.

@jgehrcke : I'm talking about attaching a new attribute to the TableIterator instance itself.

Gotcha! I'll look into that. Thanks for the feedback, much appreciated.

Attaching attributes to the TableIterator's s.table is riskier.

Ack!

There is however no obvious way for me to access the TableIterator instance from the GenericIndexCol instance. The latter inherits from IndexCol which gets its .table attribute set through a set_table() method, but as far as I can see there is no existing obvious relationship to reach back to the TableIterator... I think I had the problem before which is why I chose the table object to piggy-back the piece of information on.

@gfyoung so this is the context where we call into the convert() method of the GenericIndexCol:

pandas/pandas/io/pytables.py

Line 3437 in a60888c

a.convert(values, nan_rep=self.nan_rep, encoding=self.encoding,

There are certainly other ways to pass along the _nrows_to_read concept but I didn't see a cleaner way yet, specifically I didn't see a way that does not add additional scaffold, passing arguments around etc.

Did you have something very specific in mind that I simply didn't see?

Interesting...we might have to then propagate an nrows parameter throughout the IndexCol instantiations.

Attaching a parameter to an object we have no information about is definitely riskier and hackier (and is therefore a practice I would discourage) compared to passing arguments around.

@jreback : Thoughts?

Attaching a parameter to an object we have no information about is definitely riskier and hackier (and is therefore a practice I would discourage) compared to passing arguments around.

I do agree.

jgehrcke · 2019-06-14T09:48:20Z

@gfyoung @jreback after following the quite entangled code path I think I found a cleaner solution.

Key insight was the following:

In HDFStore.select() we define a callback func:

        # function to call on iteration
        def func(_start, _stop, _where):
            return s.read(start=_start, stop=_stop,
                          where=_where,
                          columns=columns)

        # create the iterator
        it = TableIterator(self, s, func, where=where, nrows=s.nrows,
                           start=start, stop=stop, iterator=iterator,
                           chunksize=chunksize, auto_close=auto_close)

        return it.get_result()

In the problematic code path s.read() actually translates to Table.read_axes() which is prepared to receive arbitrary keyword arguments:

    def read_axes(self, where, **kwargs):
        """create and return the axes sniffed from the table: return boolean
        for success
        """
        ...

As of the callback being s.read(start=_start, stop=_stop, ....) we know that the start and stop parameters are available as keyword arguments within Table.read_axes().

read_axes() calls into GenericIndexCol.convert() and so this is a decent opportunity to pass the start and stop parameters directly to convert(). I have added a corresponding commit. Please have another look, thanks!

jreback · 2019-06-14T12:27:59Z

doc/source/whatsnew/v0.25.0.rst

@@ -656,6 +656,7 @@ I/O
 - Bug in :func:`read_csv` not properly interpreting the UTF8 encoded filenames on Windows on Python 3.6+ (:issue:`15086`)
 - Improved performance in :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` when converting columns that have missing values (:issue:`25772`)
 - Bug in :meth:`DataFrame.to_html` where header numbers would ignore display options when rounding (:issue:`17280`)
+- Bug in :func:`read_hdf` where the `start` and `stop` arguments would raise a ``ValueError`` indicating an index size mismatch (:issue:`11188`)


this is not very informative of what you are actually addressing, can you re-word

commit upcoming

jreback · 2019-06-14T12:28:21Z

pandas/io/pytables.py

@@ -1627,7 +1627,7 @@ def infer(self, handler):
        new_self.read_metadata(handler)
        return new_self

-    def convert(self, values, nan_rep, encoding, errors):
+    def convert(self, values, nan_rep, encoding, errors, **kwargs):
        """ set the values from this selection: take = take ownership """


can you add a doc-string here

types are a bonus

I would just add start=None and stop=None instead of kwargs

I would just add start=None and stop=None instead of kwargs

done, commit upcoming

can you add a doc-string here

I would love to add one! I have difficulties understanding the general purpose of the convert() method though and documenting values, nan_rep, encoding, and errors is pretty challenging given my lack of understanding. As far as I see the start and stop arguments are not really meaningful arguments here (in IndexCol.convert()), so I am also a bit helpless with documenting them! :-) How would you document start and stop here?

jreback · 2019-06-14T12:28:52Z

pandas/io/pytables.py

@@ -1816,10 +1816,14 @@ class GenericIndexCol(IndexCol):
    def is_indexed(self):
        return False

-    def convert(self, values, nan_rep, encoding, errors):
+    def convert(self, values, nan_rep, encoding, errors, **kwargs):
        """ set the values from this selection: take = take ownership """


doc-string here

👍 adding a docstring explaining the start and stop` parameters, commit upcoming

jreback · 2019-06-14T12:29:02Z

pandas/io/pytables.py

@@ -2162,7 +2166,7 @@ def validate_attr(self, append):
                raise ValueError("appended items dtype do not match existing "
                                 "items dtype in table!")

-    def convert(self, values, nan_rep, encoding, errors):
+    def convert(self, values, nan_rep, encoding, errors, **kwargs):
        """set the data from this selection (and convert to the correct dtype


jreback · 2019-06-14T12:30:16Z

pandas/tests/io/test_pytables.py

+        ]
+
+        # This returns a path and does not open the file.
+        tmpfilepath = create_tempfile(self.path)


so this should just be a fixture

👍

I now turned the entire thing into a module-scoped fixture, providing the path to the HDF5 file and details about its contents.

jreback · 2019-06-14T12:30:41Z

pandas/tests/io/test_pytables.py

+
+    def _compare(self, df, samples):
+        """Compare the reference `samples` with the contents in DataFrame `df`.
+        """


don't write a custom comparator, use assert_frame_equal

Thanks.

Interesting:

E AssertionError: Attributes are different E E Attribute "dtype" are different E [left]: uint32 E [right]: int64

But that's probably expected, right? So, we expect the values in the UInt32Col in the PyTables file to end up being of type int64, right?

Okay, thanks for the review, I have committed and pushed changes and would appreciate another round of feedback.

🚀

jgehrcke · 2019-06-14T20:32:25Z

pandas/io/pytables.py

+            Table row number: the start of the sub-selection.
+        stop : int, optional
+            Table row number: the end of the sub-selection. Values larger than
+            the underlying table's row count are normalized to that.


At least here I understand the meaning of start and stop and have tried to document it.

right, can you add all the parameters. See if you can put something down for them, some doc-string is better than none.

👍

Adding statements for values, nan_rep, encoding, and errors.

The biggest challenge for me still is that I do not feel like I understand the general purpose of the convert() method and its arguments.

jgehrcke · 2019-06-15T18:51:04Z

Will polish (squash commits, address line length issues) after another round of comments.

Update: just force-pushed after rebase on current master:

removed the unrelated requirements-dev.txt patch
addressed linter errors (line length)

--start and --stop upon read_hdf() do not yet work as of pandas-dev/pandas#26818

jreback · 2019-06-19T00:43:19Z

pandas/io/pytables.py

+            Table row number: the start of the sub-selection.
+        stop : int, optional
+            Table row number: the end of the sub-selection. Values larger than
+            the underlying table's row count are normalized to that.


right, can you add all the parameters. See if you can put something down for them, some doc-string is better than none.

jreback · 2019-06-19T00:43:38Z

pandas/io/pytables.py

+        """
+
+        start = start if start is not None else 0
+        stop = min(stop, self.table.nrows) \


use perens rather than a line-continuation

commit upcoming

jreback · 2019-06-19T00:44:10Z

pandas/io/pytables.py

            a.convert(values, nan_rep=self.nan_rep, encoding=self.encoding,
-                      errors=self.errors)
+                      errors=self.errors, **kwargs)


where is this kwargs coming from? do you mean to pass ing start/stop?

where is this kwargs coming from

I have tried to answer this question with the code comment added right above that line, and all I know is that kwargs may contain start and stop. I could not infer any guarantees.

Following the kwargs flow(s) is a challenging part of reading io/pytables.py. It's pretty non-obvious.

There are two calls into that function that pass **kwargs where it's non-obvious where it's coming from:

pandas/pandas/io/pytables.py

Line 3888 in d47947a

if not self.read_axes(where=where, **kwargs):

pandas/pandas/io/pytables.py

Line 4143 in d47947a

if not self.read_axes(where=where, **kwargs):

For following just the latter flow I am now looking for read() being called and this is where things get tricky...

$ grep -nr 'read(' pandas/io/pytables.py | grep -v 'def ' | grep -v validate_read 721: return s.read(start=_start, stop=_stop, 837: objs = [t.read(where=_where, columns=columns, start=_start, 1421: return s.read(**kwargs) 2934: sdict[c] = s.read() 4241: s = super().read(columns=columns, **kwargs) 4342: df = super().read(**kwargs) 4740: return self.table.table.read(start=self.start, stop=self.stop)

From here it looks like there is more kwargs to follow. I know one definite call path that comes along line 837 and there we know that start and stop are being passed explicitly (that's the code path coming from read_hdf()). But I am not sure about other possible code paths.

Relying on the test suite to not break, but also relying on not relying on start and stop being passed explicitly.

I am now doing this instead:

- errors=self.errors, **kwargs) + errors=self.errors, start=kwargs.get('start'), + stop=kwargs.get('stop'))

So that things still don't break when start or stop are not in kwargs. Sound good?

jreback · 2019-06-19T00:44:32Z

pandas/tests/io/test_pytables.py

@@ -5167,3 +5168,71 @@ def test_dst_transitions(self):
                store.append('df', df)
                result = store.select('df')
                assert_frame_equal(result, df)
+
+
+@pytest.fixture(scope='module')


this is not necessary, function scoped is fine

👍 commit upcoming

jreback · 2019-06-19T00:45:33Z

pandas/tests/io/test_pytables.py

+        {'c0': t0 + 2, 'c1': 'ccccc', 'c2': 10**5},
+        {'c0': t0 + 3, 'c1': 'ddddd', 'c2': 4294967295},
+    ]
+


use ensure_clean_path here

Done, commit upcoming.

Food for thought (I don't expect answers for this pull request here, it's probably a rabbit hole):

If a test fails, don't we want to retain the corresponding HDF5 file in the file system to enable later inspection? If the answer is "yes" then we should not use a context manager for (quite reliable) file removal but do it in the fixture teardown only upon test success.

What do we think about using a pytest-provided fixture for temporary file management?

@pytest.fixture() def pytables_hdf5_file(tmp_path): [...]

In that case the file path would contain the test name, e.g. tmp_path = PosixPath('PYTEST_TMPDIR/test_create_file0') as documented here: https://docs.pytest.org/en/latest/tmpdir.html

ensure_clean_path already does this, not really necessary to do anything else

ensure_clean_path already does this

The way I read the code it will always remove the file, also upon test failure.

Anyway :).

jreback · 2019-06-19T00:45:57Z

pandas/tests/io/test_pytables.py

+    return tmpfilepath, objectname, pd.DataFrame(testsamples)
+
+
+class TestReadPyTablesHDF5(Base):


do you need to inherit from Base?

do you need to inherit from Base?

I do not really know. Not sure if tm.reset_testing_mode() is important at all in my context here, # Pytables 3.0.0 deprecates lots of things kind of sounds important though:

@classmethod def setup_class(cls): # Pytables 3.0.0 deprecates lots of things tm.reset_testing_mode() @classmethod def teardown_class(cls): # Pytables 3.0.0 deprecates lots of things tm.set_testing_mode()

Given that you doubt that I need it I have added a commit and don't inherit from Base anymore. The tests still pass on my system.

jreback · 2019-06-19T00:46:28Z

pandas/tests/io/test_pytables.py

+
+    def test_read_complete(self, pytables_hdf5_file):
+        path, objname, expected_df = pytables_hdf5_file
+        assert_frame_equal(pd.read_hdf(path, key=objname), expected_df)


write the asserts like

result =
expected =
assert_frame_equal(result, expected)

done, thanks

jgehrcke · 2019-06-19T15:31:05Z

@jreback I tried to address all points in your last round of feedback. I feel like we're getting closer. Would very much appreciate another look, thanks for your time 🕙 !

jreback · 2019-06-21T02:14:26Z

pandas/io/pytables.py

+        ----------
+
+        values :
+            Expected to be passed but ignored in this implementation.


can you write that these are, e.g.

values: np.ndarray
nan_rep : str
encoding: str
errors : str (I think)

jreback · 2019-06-21T02:14:36Z

pandas/io/pytables.py

+        values :
+            Expected to be passed but ignored in this implementation.
+        nan_rep :
+            Expected to be passed but ignored in this implementation.


don't need these comments

jreback · 2019-06-21T02:15:20Z

pandas/tests/io/test_pytables.py

@@ -5,6 +5,7 @@
 from io import BytesIO
 import os
 import tempfile
+import time


what are you using this for?

Used this for dynamically generating unix time stamps (time.time()) before writing them to the HDF5 file, but made this static now.

jreback · 2019-06-21T02:15:56Z

pandas/tests/io/test_pytables.py

+        {'c0': t0 + 2, 'c1': 'ccccc', 'c2': 10**5},
+        {'c0': t0 + 3, 'c1': 'ddddd', 'c2': 4294967295},
+    ]
+


ensure_clean_path already does this, not really necessary to do anything else

jreback · 2019-06-21T02:16:12Z

pandas/tests/io/test_pytables.py

@@ -5167,3 +5168,70 @@ def test_dst_transitions(self):
                store.append('df', df)
                result = store.select('df')
                assert_frame_equal(result, df)
+
+
+@pytest.fixture()


you don't need the ()

jreback · 2019-06-21T02:17:24Z

pandas/tests/io/test_pytables.py

@@ -5167,3 +5168,70 @@ def test_dst_transitions(self):
                store.append('df', df)
                result = store.select('df')
                assert_frame_equal(result, df)
+


can you move these to a separate (new) file: test_pytables_compat.py (even better to move to

pandas/tests/io/pytables/test_compat.py (you will need to add an init.py in the subdir)

we are starting to split the test_pytables up a bit

(even better to move to pandas/tests/io/pytables/test_compat.py )

Doing so!

(you will need to add an init.py in the subdir)

Why?

It does not need to be a Python package for pytest to discover tests in the directory:

$ pytest --collect-only pandas/tests [...] <Module pandas/tests/io/pytables/test_compat.py> <Class TestReadPyTablesHDF5> <Function test_read_complete> <Function test_read_with_start> <Function test_read_with_stop> <Function test_read_with_startstop> [...]

Any other reason?

jgehrcke · 2019-06-21T09:58:04Z

@jreback thanks for your feedback once again.

I have

rebased on upstream/master (clean)
added four new commits to address your last comments

In particular, I have now moved the new tests to pandas/tests/io/pytables/test_compat.py as suggested. While doing so I have not created a pandas/tests/io/pytables/__init__.py file because I don't see the reason yet (it's not for test discovery, but is it for linters?).

I have also not used ensure_clean_path in the new test module (because then I would have needed to refactor it out of where it is right now). Instead I have used the pytest tmp_path fixture which I think provides everything we need with less code: it guarantees file system isolation between tests, it retains files for later inspection (as opposed to the ensure_clean_path context manager) and removes them in a rolling fashion so that storage space requirement stays predictable (does not grow indefinitely with the number of invocations). For instance, this is after invoking pytest 11 times (files from the last three invocations are retained, the rest got rotated away):

$ tree
.
├── pytest-10
│   ├── test_read_complete0
│   │   └── written_with_pytables.h5
│   ├── test_read_completecurrent -> /tmp/pytest-of-jp/pytest-10/test_read_complete0
│   ├── test_read_with_start0
│   │   └── written_with_pytables.h5
│   ├── test_read_with_startcurrent -> /tmp/pytest-of-jp/pytest-10/test_read_with_start0
│   ├── test_read_with_startstop0
│   │   └── written_with_pytables.h5
│   ├── test_read_with_startstopcurrent -> /tmp/pytest-of-jp/pytest-10/test_read_with_startstop0
│   ├── test_read_with_stop0
│   │   └── written_with_pytables.h5
│   └── test_read_with_stopcurrent -> /tmp/pytest-of-jp/pytest-10/test_read_with_stop0
├── pytest-8
│   ├── test_read_complete0
│   │   └── written_with_pytables.h5
│   ├── test_read_completecurrent -> /tmp/pytest-of-jp/pytest-8/test_read_complete0
│   ├── test_read_with_start0
│   │   └── written_with_pytables.h5
│   ├── test_read_with_startcurrent -> /tmp/pytest-of-jp/pytest-8/test_read_with_start0
│   ├── test_read_with_startstop0
│   │   └── written_with_pytables.h5
│   ├── test_read_with_startstopcurrent -> /tmp/pytest-of-jp/pytest-8/test_read_with_startstop0
│   ├── test_read_with_stop0
│   │   └── written_with_pytables.h5
│   └── test_read_with_stopcurrent -> /tmp/pytest-of-jp/pytest-8/test_read_with_stop0
├── pytest-9
│   ├── test_read_complete0
│   │   └── written_with_pytables.h5
│   ├── test_read_completecurrent -> /tmp/pytest-of-jp/pytest-9/test_read_complete0
│   ├── test_read_with_start0
│   │   └── written_with_pytables.h5
│   ├── test_read_with_startcurrent -> /tmp/pytest-of-jp/pytest-9/test_read_with_start0
│   ├── test_read_with_startstop0
│   │   └── written_with_pytables.h5
│   ├── test_read_with_startstopcurrent -> /tmp/pytest-of-jp/pytest-9/test_read_with_startstop0
│   ├── test_read_with_stop0
│   │   └── written_with_pytables.h5
│   └── test_read_with_stopcurrent -> /tmp/pytest-of-jp/pytest-9/test_read_with_stop0
└── pytest-current -> /tmp/pytest-of-jp/pytest-10

jreback · 2019-06-21T12:13:21Z

pandas/tests/io/pytables/test_compat.py

+
+    objname = 'pandas_test_timeseries'
+
+    # The `tmp_path` fixture provides a temporary directory unique to the


can you use ensure_clean_path; we don't use the pytest fixture at this time, though you can open an issue to change (many occurrences of this I suppose).

jreback

lgtm, but need to clean up the test files, ping on green.

jreback · 2019-06-21T12:14:53Z

pandas/tests/io/pytables/test_compat.py

+def pytables_hdf5_file(tmp_path):
+    """Use PyTables to create a simple HDF5 file.
+
+    There is no need for temporary file cleanup: pytest's `tmp_path` fixture


remove this comment as well. see my comment below. we don't want to introduce a different idiom for testing (but as I said you can open an issue for discussion).

you can open an issue for discussion

#26984 :)

jreback · 2019-06-21T12:50:15Z

pandas/tests/io/pytables/test_compat.py

+
+
+@pytest.fixture
+def pytables_hdf5_file(tmp_path):


I don't think you need this fixture anymore

I don't think you need this fixture anymore

Thanks man. I hope one of the linters would have told me! :-)

yeah i don’t think we have that check in linting :)

jreback · 2019-06-21T12:50:30Z

pandas/tests/io/pytables/test_compat.py

+
+class TestReadPyTablesHDF5:
+    """
+    A group of tests which covers reading HDF5 files written by plain PyTables


can you add the issue number here in the comment

jgehrcke · 2019-06-21T14:12:45Z

@jreback it's green now. Wanna have another look before I squash some commits?

jreback · 2019-06-21T14:19:58Z

thanks @jgehrcke nice patch. We squash on merge anyhow, so no need!

jreback · 2019-06-21T14:20:37Z

If interested would like to move the existing test_pytables test files (test_pytables.py and test_pytables_missing.py) to pandas/test/io/pytables (just a simple move). We want to split the main file, but that will be separate.

and I think we usually add an __init__.py file in the pytables/ dir as well

jgehrcke · 2019-06-21T15:08:51Z

thanks @jgehrcke nice patch

Thank you for the nice collaboration!

jgehrcke · 2019-06-21T16:06:16Z

If interested would like to move the existing test_pytables test files (test_pytables.py and test_pytables_missing.py) to pandas/test/io/pytables (just a simple move).

Sure, PR upcoming.

Update: #26986

simonjayhawkins reviewed Jun 12, 2019

View reviewed changes

gfyoung added Bug IO HDF5 read_hdf, HDFStore labels Jun 12, 2019

jgehrcke force-pushed the jp/fix-11188 branch from 8070d9a to cce0e57 Compare June 12, 2019 21:28

jgehrcke force-pushed the jp/fix-11188 branch from cce0e57 to 7f21dbe Compare June 12, 2019 21:44

jgehrcke changed the title ~~Work in progress: fix issue #11188~~ Fix issue #11188 Jun 12, 2019

gfyoung reviewed Jun 12, 2019

View reviewed changes

jreback changed the title ~~Fix issue #11188~~ COMPAT: reading generic PyTables Table format fails with sub-selection Jun 13, 2019

jgehrcke mentioned this pull request Jun 13, 2019

requirements-dev.txt: "Could not find a version that satisfies the requirement pyqt" #26838

Closed

jreback requested changes Jun 14, 2019

View reviewed changes

jgehrcke commented Jun 14, 2019

View reviewed changes

jgehrcke force-pushed the jp/fix-11188 branch from 5cd2ab1 to 5d54674 Compare June 18, 2019 11:01

jgehrcke added a commit to jgehrcke/goeffel that referenced this pull request Jun 18, 2019

messer-analysis: start implementing --first and --last support

1ab7184

--start and --stop upon read_hdf() do not yet work as of pandas-dev/pandas#26818

jreback requested changes Jun 19, 2019

View reviewed changes

jreback requested changes Jun 21, 2019

View reviewed changes

jgehrcke added 7 commits June 21, 2019 11:42

TST: add TestReadPyTablesHDF5 test scaffold

19dc304

BUG: this fixes pandas-dev#11188

e9c7c39

CLN: do not piggyback idx size on table obj

b7a082a

DOC: update whatsnew for pandas-dev#11188

964cba1

DOC: improve changelog (squash later)

04b8423

squash: remove kwargs, add a bit of docstring

7d200a5

squash: rework tests (use fixture, assert_frame_equal)

70e78c9

jgehrcke added 6 commits June 21, 2019 11:42

DOC: add some docstring

cd69c0b

squash: flake8 error

a9c6f15

squash: change docstrings one more time

8adf459

squash: tests: cleanup

aed78ff

TST: move tests to io/pytables/test_compat.py

53dba1a

TST: use pytest fixture instead of ensure_clean_path

dfec26e

jgehrcke force-pushed the jp/fix-11188 branch from 466453c to dfec26e Compare June 21, 2019 09:43

jgehrcke added 3 commits June 21, 2019 13:27

squash: address linter errors

b9421af

squash: pass path as text

9c0e96b

squash: tests: improve code comment

79bed6a

jreback reviewed Jun 21, 2019

View reviewed changes

jreback requested changes Jun 21, 2019

View reviewed changes

jreback reviewed Jun 21, 2019

View reviewed changes

squash: use ensure_clean_path again

bca6ee6

jreback reviewed Jun 21, 2019

View reviewed changes

jgehrcke mentioned this pull request Jun 21, 2019

Use pytest tmp file management fixtures instead of ensure_clean[_path|_dir] #26984

Closed

jgehrcke added 2 commits June 21, 2019 14:55

squash: tests: cleanup

60d37e0

squash: tests: fix isort error

1ce1a70

jreback added this to the 0.25.0 milestone Jun 21, 2019

jreback approved these changes Jun 21, 2019

View reviewed changes

jreback merged commit dda4c1a into pandas-dev:master Jun 21, 2019

jgehrcke mentioned this pull request Jun 21, 2019

CLN: move pytables tests to tests/io/pytables dir #26986

Merged

st-bender mentioned this pull request May 23, 2020

Cope with missing HDF keys dask/dask#6204

Merged

		return tmpfilepath, objectname, pd.DataFrame(testsamples)


		class TestReadPyTablesHDF5(Base):


		objname = 'pandas_test_timeseries'

		# The `tmp_path` fixture provides a temporary directory unique to the

COMPAT: reading generic PyTables Table format fails with sub-selection #26818

COMPAT: reading generic PyTables Table format fails with sub-selection #26818

Conversation

jgehrcke commented Jun 12, 2019 • edited Loading

Choose a reason for hiding this comment

jgehrcke Jun 12, 2019 • edited Loading

Choose a reason for hiding this comment

pep8speaks commented Jun 12, 2019 • edited Loading

Comment last updated at 2019-06-21 13:33:33 UTC

codecov bot commented Jun 12, 2019

Codecov Report

codecov bot commented Jun 12, 2019 • edited Loading

Codecov Report

jgehrcke commented Jun 12, 2019

jgehrcke commented Jun 12, 2019 • edited Loading

gfyoung Jun 12, 2019 • edited Loading

Choose a reason for hiding this comment

jgehrcke Jun 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Jun 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke commented Jun 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke Jun 19, 2019 • edited Loading

Choose a reason for hiding this comment

jgehrcke commented Jun 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke Jun 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke Jun 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke commented Jun 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke Jun 21, 2019 • edited Loading

Choose a reason for hiding this comment

jgehrcke commented Jun 21, 2019 • edited Loading

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgehrcke commented Jun 12, 2019 •

edited

Loading

jgehrcke Jun 12, 2019 •

edited

Loading

pep8speaks commented Jun 12, 2019 •

edited

Loading

codecov bot commented Jun 12, 2019 •

edited

Loading

jgehrcke commented Jun 12, 2019 •

edited

Loading

gfyoung Jun 12, 2019 •

edited

Loading

jgehrcke Jun 13, 2019 •

edited

Loading

gfyoung Jun 13, 2019 •

edited

Loading

jgehrcke Jun 19, 2019 •

edited

Loading

jgehrcke commented Jun 15, 2019 •

edited

Loading

jgehrcke Jun 19, 2019 •

edited

Loading

jgehrcke Jun 19, 2019 •

edited

Loading

jgehrcke Jun 21, 2019 •

edited

Loading

jgehrcke commented Jun 21, 2019 •

edited

Loading

jreback commented Jun 21, 2019 •

edited

Loading

jgehrcke commented Jun 21, 2019 •

edited

Loading