-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: allow describe() for DataFrames with only boolean columns #13898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thx for the PR! We appreciate including the tests from first to see the detail. |
@@ -5138,9 +5139,11 @@ def describe_1d(data): | |||
if self.ndim == 1: | |||
return describe_1d(self) | |||
elif (include is None) and (exclude is None): | |||
if len(self._get_numeric_data()._info_axis) > 0: | |||
if len(self._get_numeric_data(exclude_bool=True)._info_axis) > 0: | |||
# when some numerics are found, keep only numerics | |||
data = self.select_dtypes(include=[np.number]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason to change _get_numeric_data
rather than changing this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR with a more minimal approach.
Current coverage is 85.30% (diff: 100%)@@ master #13898 diff @@
==========================================
Files 139 139
Lines 50143 50143
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 42777 42776 -1
- Misses 7366 7367 +1
Partials 0 0
|
if len(self._get_numeric_data()._info_axis) > 0: | ||
ncols_numeric = len(self._get_numeric_data()._info_axis) | ||
ncols_bool = len(self._get_bool_data()._info_axis) | ||
if ncols_numeric > ncols_bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't look correct. we want bool
describe only when there is no number column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is correct! BoolBlock
inherits from NumericBlock
, so ncols_numeric
is always greater than or equal to ncols_bool
. If they differ, we do have non-boolean numeric columns, which we select with np.number
. If they coincide and ncols_bool > 0
, the only numeric columns are boolean and we get them with np.bool
. The remaining case has no numeric columns — not even boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus u don't have to change the line, when self.select_dtypes(include=[np.number])
is empty use self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree use .select_dtype
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I understand. Indeed, much cleaner.
@sinhrks Changes made. |
lgtm. ping on green. |
? |
when all of the checks have passed it will turn grin. ping then. |
thanks @agraboso |
git diff upstream/master | flake8 --diff
Closes #13891. Existing tests pass.
No new tests added yet.