Skip to content

Read hdf returns unexpected values for categorical #39420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
bb2d803
BUG: fix case of a category value which isn't exists (#39189)
nofarmish Jan 22, 2021
f9be625
BUG: add UT to conver_value for this use case (#39189)
nofarmish Jan 23, 2021
aa90441
BUG: change style with pre-commit (#39189)
nofarmish Jan 23, 2021
e8ca3fc
BUG: add a whatsnew record (#39189)
nofarmish Jan 23, 2021
b5ded49
Trigger Build
nofarmish Jan 23, 2021
0cb8ad7
BUG: check for tests (#39189)
nofarmish Jan 23, 2021
8284e0b
BUG: remove spaces (#39189)
nofarmish Jan 23, 2021
9773aaa
BUG: remove whatsnew (#39189)
nofarmish Jan 23, 2021
4281ef0
BUG: remove tests(#39189)
nofarmish Jan 23, 2021
7178757
BUG: add whats new (#39189)
nofarmish Jan 23, 2021
ca9420e
BUG: check tests (#39189)
nofarmish Jan 23, 2021
f61b7c5
BUG: update tests (#39189)
nofarmish Jan 26, 2021
8c3b3b6
BUG: update after precommit (#39189)
nofarmish Jan 26, 2021
4e3bce2
BUG: update after precommit (#39189)
nofarmish Jan 26, 2021
558a585
BUG: fix case of a category value which isn't exists (#39189)
nofarmish Jan 22, 2021
adfe600
BUG: add UT to conver_value for this use case (#39189)
nofarmish Jan 23, 2021
63815c7
BUG: change style with pre-commit (#39189)
nofarmish Jan 23, 2021
74c687a
BUG: add a whatsnew record (#39189)
nofarmish Jan 23, 2021
3023fc0
BUG: check for tests (#39189)
nofarmish Jan 23, 2021
f917ba9
BUG: remove spaces (#39189)
nofarmish Jan 23, 2021
0abe192
BUG: remove whatsnew (#39189)
nofarmish Jan 23, 2021
1b959ee
BUG: remove tests(#39189)
nofarmish Jan 23, 2021
4de349f
BUG: add whats new (#39189)
nofarmish Jan 23, 2021
d7a3ef6
BUG: check tests (#39189)
nofarmish Jan 23, 2021
eb8cd5a
BUG: update tests (#39189)
nofarmish Jan 26, 2021
235d05e
BUG: update after precommit (#39189)
nofarmish Jan 26, 2021
73541ff
BUG: update after precommit (#39189)
nofarmish Jan 26, 2021
877ae9e
Merge remote-tracking branch 'origin/read-hdf-returns-unexpected-valu…
nofarmish Jan 26, 2021
017e47a
BUG: change test location (#39189)
nofarmish Jan 30, 2021
d67ff95
BUG: remove import (#39189)
nofarmish Jan 30, 2021
37eef60
BUG: remove import (#39189)
nofarmish Jan 30, 2021
b3565af
BUG: remove list() before sorted() (#39189)
nofarmish Jan 30, 2021
5af7c04
BUG: remove list() in sorted() (#39189)
nofarmish Jan 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,7 @@ I/O
- Bug in :func:`read_csv` not switching ``true_values`` and ``false_values`` for nullable ``boolean`` dtype (:issue:`34655`)
- Bug in :func:`read_json` when ``orient="split"`` does not maintain numeric string index (:issue:`28556`)
- :meth:`read_sql` returned an empty generator if ``chunksize`` was no-zero and the query returned no results. Now returns a generator with a single empty dataframe (:issue:`34411`)
- Bug in :func:`read_hdf` returning unexpected records when filtering on categorical string columns using ``where`` parameter (:issue:`39189`)

Period
^^^^^^
Expand Down
20 changes: 12 additions & 8 deletions pandas/core/computation/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from pandas.core.computation.ops import UndefinedVariableError, is_term
from pandas.core.construction import extract_array
from pandas.core.indexes.base import Index
from pandas.core.series import Series

from pandas.io.formats.printing import pprint_thing, pprint_thing_encoded

Expand Down Expand Up @@ -209,14 +210,8 @@ def stringify(value):
v = Timedelta(v, unit="s").value
return TermValue(int(v), v, kind)
elif meta == "category":
metadata = extract_array(self.metadata, extract_numpy=True)
result = metadata.searchsorted(v, side="left")

# result returns 0 if v is first element or if v is not in metadata
# check that metadata contains v
if not result and v not in metadata:
result = -1
return TermValue(result, result, "integer")
term_value = self._convert_category_value(self.metadata, v)
return term_value
elif kind == "integer":
v = int(float(v))
return TermValue(v, v, kind)
Expand Down Expand Up @@ -245,6 +240,15 @@ def stringify(value):
else:
raise TypeError(f"Cannot compare {v} of type {type(v)} to {kind} column")

@staticmethod
def _convert_category_value(metadata: Series, value: Any) -> TermValue:
metadata = extract_array(metadata, extract_numpy=True)
if value not in metadata:
result = -1
else:
result = metadata.searchsorted(value, side="left")
return TermValue(result, result, "integer")

def convert_values(self):
pass

Expand Down
23 changes: 23 additions & 0 deletions pandas/tests/computation/test_pytables.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from typing import Any

import numpy as np
import pytest

from pandas import Series
from pandas.core.computation.pytables import BinOp, TermValue


@pytest.mark.parametrize(
"value, expected_results",
[("q", TermValue(-1, -1, "integer")), ("a", TermValue(0, 0, "integer"))],
)
def test__convert_value(value: Any, expected_results: TermValue):
metadata = Series(np.array(["a", "b", "s"]))

result = BinOp._convert_category_value(metadata, value)

assert (
result.kind == expected_results.kind
and result.value == expected_results.value
and result.converted == expected_results.converted
)