Skip to content

BUG: IntervalIndex.astype("category") doesn't preserve exact interval dtype in categories #38316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Dec 5, 2020 · 2 comments · Fixed by #48226
Labels
Astype Bug Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Somewhere in the conversion, before factorizing, we convert the interval array/index to a object-dtype numpy array of Interval objects, and so afterwards infer the IntervalDtype again when creating the categories.

Example consequence is that if you have uint64 intervals, they get inferred as int64 afterwards:

In [29]: index = pd.IntervalIndex.from_breaks(np.arange(5, dtype="uint64"))

In [30]: index
Out[30]: 
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4]],
              closed='right',
              dtype='interval[uint64]')  # <---- unsigned ints

In [31]: pd.CategoricalIndex(index)
Out[31]: CategoricalIndex([(0, 1], (1, 2], (2, 3], (3, 4]], categories=[(0, 1], (1, 2], (2, 3], (3, 4]], ordered=False, dtype='category')

In [33]: pd.CategoricalIndex(index).dtype.categories
Out[33]: 
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4]],
              closed='right',
              dtype='interval[int64]')  # <---- no longer uint64
@jorisvandenbossche jorisvandenbossche added Bug Categorical Categorical Data Type Interval Interval data type labels Dec 5, 2020
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Dec 5, 2020
@Alvinwuzw
Copy link

Hi! I can reproduce the issue, could I look into this issue?

@mroeschke mroeschke added the Dtype Conversions Unexpected or buggy dtype conversions label Aug 14, 2021
@Kyrpel
Copy link

Kyrpel commented Apr 5, 2022

I have confirmed in the latest main branch that this is not an issue anymore since both output the same
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4]], dtype='interval[uint64, right]')

@phofl phofl modified the milestones: Contributions Welcome, 1.5, 1.6 Aug 29, 2022
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astype Bug Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type
Projects
None yet
6 participants