Skip to content

Calling qcut with too many duplicates now gives an informative error #9030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.15.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ Bug Fixes
- Bug in ``merge`` where ``how='left'`` and ``sort=False`` would not preserve left frame order (:issue:`7331`)
- Fix: The font size was only set on x axis if vertical or the y axis if horizontal. (:issue:`8765`)
- Fixed division by 0 when reading big csv files in python 3 (:issue:`8621`)

- Fixed an unclear error message in ``qcut`` when repeated values result in duplicate bin edges.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move this to 0.16.0




Expand Down
2 changes: 1 addition & 1 deletion pandas/tools/tests/test_tile.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def test_qcut_specify_quantiles(self):
self.assertTrue(factor.equals(expected))

def test_qcut_all_bins_same(self):
assertRaisesRegexp(ValueError, "edges.*unique", qcut, [0,0,0,0,0,0,0,0,0,0], 3)
assertRaisesRegexp(ValueError, "quantiles.*repeated", qcut, [0,0,0,0,0,0,0,0,0,0], 3)

def test_cut_out_of_bounds(self):
arr = np.random.randn(100)
Expand Down
4 changes: 4 additions & 0 deletions pandas/tools/tile.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,10 @@ def qcut(x, q, labels=None, retbins=False, precision=3):
else:
quantiles = q
bins = algos.quantile(x, quantiles)
if len(algos.unique(bins)) < len(bins):
bins_sorted = np.sort(bins, axis=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do this as a set operation (set(bins)-set(unique(bins)) once you know that you have too many bins

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't work. Say bins = [0,0,0,1,1,1]. We want [0,1]. (set(bins)-set(unique(bins)) gives set([]).

Something like it should, but I can't think of it at the moment.

bins_dup = algos.unique(bins_sorted[bins_sorted[1:] == bins_sorted[:-1]])
raise ValueError('One or more quantiles consists entirely of a repeated value: %s' % repr(bins_dup))
return _bins_to_cuts(x, bins, labels=labels, retbins=retbins,precision=precision,
include_lowest=True)

Expand Down