Skip to content

BUG: allow describe() for DataFrames with only boolean columns #13898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

BUG: allow describe() for DataFrames with only boolean columns #13898

wants to merge 1 commit into from

Conversation

agraboso
Copy link
Contributor

@agraboso agraboso commented Aug 4, 2016

Closes #13891. Existing tests pass. No new tests added yet.

@sinhrks
Copy link
Member

sinhrks commented Aug 4, 2016

Thx for the PR! We appreciate including the tests from first to see the detail.

@sinhrks sinhrks added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Dtype Conversions Unexpected or buggy dtype conversions labels Aug 4, 2016
@@ -5138,9 +5139,11 @@ def describe_1d(data):
if self.ndim == 1:
return describe_1d(self)
elif (include is None) and (exclude is None):
if len(self._get_numeric_data()._info_axis) > 0:
if len(self._get_numeric_data(exclude_bool=True)._info_axis) > 0:
# when some numerics are found, keep only numerics
data = self.select_dtypes(include=[np.number])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason to change _get_numeric_data rather than changing this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR with a more minimal approach.

@codecov-io
Copy link

codecov-io commented Aug 4, 2016

Current coverage is 85.30% (diff: 100%)

Merging #13898 into master will decrease coverage by <.01%

@@             master     #13898   diff @@
==========================================
  Files           139        139          
  Lines         50143      50143          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42777      42776     -1   
- Misses         7366       7367     +1   
  Partials          0          0          

Powered by Codecov. Last update 7e15923...26201aa

if len(self._get_numeric_data()._info_axis) > 0:
ncols_numeric = len(self._get_numeric_data()._info_axis)
ncols_bool = len(self._get_bool_data()._info_axis)
if ncols_numeric > ncols_bool:
Copy link
Member

@sinhrks sinhrks Aug 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look correct. we want bool describe only when there is no number column?

Copy link
Contributor Author

@agraboso agraboso Aug 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is correct! BoolBlock inherits from NumericBlock, so ncols_numeric is always greater than or equal to ncols_bool. If they differ, we do have non-boolean numeric columns, which we select with np.number. If they coincide and ncols_bool > 0, the only numeric columns are boolean and we get them with np.bool. The remaining case has no numeric columns — not even boolean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus u don't have to change the line, when self.select_dtypes(include=[np.number]) is empty use self.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree use .select_dtype here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I understand. Indeed, much cleaner.

@agraboso
Copy link
Contributor Author

agraboso commented Aug 5, 2016

@sinhrks Changes made.

@jreback
Copy link
Contributor

jreback commented Aug 6, 2016

lgtm. ping on green.

@agraboso
Copy link
Contributor Author

agraboso commented Aug 6, 2016

ping on green.

?

@jreback
Copy link
Contributor

jreback commented Aug 6, 2016

when all of the checks have passed it will turn grin. ping then.

@agraboso
Copy link
Contributor Author

agraboso commented Aug 8, 2016

@jreback

@jreback jreback added this to the 0.19.0 milestone Aug 8, 2016
@jreback
Copy link
Contributor

jreback commented Aug 8, 2016

thanks @agraboso

@jreback jreback closed this in 72be37b Aug 8, 2016
@agraboso agraboso deleted the fix-13891 branch August 9, 2016 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: allow describe() for on boolean-only columns
4 participants