Skip to content

BUG: is_string_dtype(pd.Index([], dtype='O')) returns False #54997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 4, 2023
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ Bug fixes
~~~~~~~~~
- Bug in :class:`AbstractHolidayCalendar` where timezone data was not propagated when computing holiday observances (:issue:`54580`)
- Bug in :class:`pandas.core.window.Rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with ``closed='left'`` and ``closed='neither'`` (:issue:`20712`)
- Bug in :func:`is_all_strings` while checking object array with no elements is of the string dtype (:issue:`54661`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the bug in is_all_strings or is_string_dtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, the bug is in is_all_strings, but we don’ t have this function in api and users see the bug while calling is_string_dtype. That is why I replaced Bug in :func:is_all_strings with: Bug in :func:is_string_dtype.

Looks like we don’t have a test for is_all_strings, do we need one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant the user-facing bug. in the docs we generally only reference public functions/methods

Copy link
Contributor Author

@natmokval natmokval Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thank you. I removed from doc/source/whatsnew/v2.2.0.rst my line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel, I updated the PR. Could you please take a look at it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we still want a note, it just needs to refer to is_string_dtype, which I think is public

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, is_string_dtype is a public function, I added the note to v2.1.1.rst. I think my change is a minor and we can put it in the version 2.1.1


Categorical
^^^^^^^^^^^
Expand Down
9 changes: 6 additions & 3 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1664,9 +1664,12 @@ def is_all_strings(value: ArrayLike) -> bool:
dtype = value.dtype

if isinstance(dtype, np.dtype):
return dtype == np.dtype("object") and lib.is_string_array(
np.asarray(value), skipna=False
)
if len(value) == 0:
return dtype == np.dtype("object")
else:
return dtype == np.dtype("object") and lib.is_string_array(
np.asarray(value), skipna=False
)
elif isinstance(dtype, CategoricalDtype):
return dtype.categories.inferred_type == "string"
return dtype == "string"
Expand Down
5 changes: 5 additions & 0 deletions pandas/tests/dtypes/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1212,3 +1212,8 @@ def test_multi_column_dtype_assignment():

df["b"] = 0
tm.assert_frame_equal(df, expected)


def test_empty_object_array_is_string_dtype():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm looks like we have tests for is_string_dtype in a couple of places. one of these days might be worth checking if there is a reason for that and if not, refactoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. I think, I found a better place for my test: pandas/tests/dtypes/test_common.py. I combined my test with another one and added a parameterization.

# GH #54661
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick can you remove the space after GH. i like the pattern GH#12345 bc it is something i grep for

Copy link
Contributor Author

@natmokval natmokval Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done. I agree, it's better to have all GH refers look alike

assert is_string_dtype(pd.Index([], dtype="O"))