Skip to content

BUG: allow describe() for on boolean-only columns #13891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lia-simeone opened this issue Aug 3, 2016 · 2 comments
Closed

BUG: allow describe() for on boolean-only columns #13891

lia-simeone opened this issue Aug 3, 2016 · 2 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@lia-simeone
Copy link

I know I can obtain the expected output by using include=['bool'], but that feels bad to me as a user. I want describe() to know that I'm only asking for boolean columns and not freak out.

Thank you!

Code Sample, a copy-pastable example if possible

>>> test_df = pd.DataFrame({'test_ind_1': [False, False, True, True, False, True , True, False],  'test_ind_2': [False, True, True, False, False, True, True,True]})
>>> test_df.describe()

Expected Output

test_ind_1 test_ind_2count           8          8unique          2          2top          True       Truefreq            4          5

Actual output

ValueError: No objects to concatenate

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-327.18.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.22.1
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 5.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.5.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None
@sinhrks sinhrks added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Dtype Conversions Unexpected or buggy dtype conversions labels Aug 3, 2016
@jreback jreback changed the title describe() breaks on boolean-only columns BUG: allow describe() for on boolean-only columns Aug 3, 2016
@jreback jreback added this to the Next Major Release milestone Aug 3, 2016
@jreback
Copy link
Contributor

jreback commented Aug 3, 2016

must be something in the exlusion logic, as an all object dtype works. PR's are welcome.

@agraboso
Copy link
Contributor

agraboso commented Aug 3, 2016

The problem is in generic.py#L5141-L5143:

if len(self._get_numeric_data()._info_axis) > 0:
    # when some numerics are found, keep only numerics
    data = self.select_dtypes(include=[np.number])

_get_numeric_data() keeps boolean columns (BoolBlock inherits from NumericBlock), but select_dtypes(include=[np.number]) does not.

>>> test_df._get_numeric_data()
  test_ind_1 test_ind_2
0      False      False
1      False       True
2       True       True
3       True      False
4      False      False
5       True       True
6       True       True
7      False       True
>>> test_df.select_dtypes(include=[np.number])
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7]

I'm working on a fix, but I'm worried I may be getting too deep into the internals of pandas...

@jreback jreback modified the milestones: 0.19.0, Next Major Release Aug 8, 2016
@jreback jreback closed this as completed in 72be37b Aug 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants