Skip to content

str.get_dummies uses astype(str) #6634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue Mar 14, 2014 · 11 comments
Closed

str.get_dummies uses astype(str) #6634

hayd opened this issue Mar 14, 2014 · 11 comments
Assignees
Labels
API Design Bug Strings String extension data type and string data
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Mar 14, 2014

I wrote this so my fault. This probably means some stuff with break when passing stuff with unicode.

Will add example.

The reason is to include integers as strings before getting dummies, maybe should just drop that functionality?

@hayd hayd added this to the 0.14.0 milestone Mar 14, 2014
@hayd hayd added Bug labels Mar 14, 2014
@hayd hayd self-assigned this Mar 14, 2014
@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

maybe only operate on object dtypes? and if passed nothing to work on raise an error?

@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

closing in favor of #6885 (good examples their)

@jreback jreback closed this as completed Apr 21, 2014
@hayd hayd changed the title get_dummies uses astype(str) str.get_dummies uses astype(str) Apr 21, 2014
@hayd
Copy link
Contributor Author

hayd commented Apr 21, 2014

IIRC this one was about str.get_dummies rather than pd.get_dummies. So slightly different but related/can use same test case.

@hayd hayd reopened this Apr 21, 2014
@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

oh...ok....suspect same issue though

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

I merged fix for #6885 this still open?

@hayd
Copy link
Contributor Author

hayd commented Apr 30, 2014

This is also a "decision" type issue, atm we apply .astype(str) to the Series before doing get dummies.

This means you can do stuff like:

In [11]: s = pd.Series(['a', 1, '1', np.nan])

In [12]: s.str.get_dummies()
Out[12]:
   1  a
0  0  1
1  1  0
2  1  0
3  0  0

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

ahh...makes sense, that is fine, does this blow up on unicode?

@hayd
Copy link
Contributor Author

hayd commented Apr 30, 2014

That's the claim, suspect on py3 only ?

@hayd
Copy link
Contributor Author

hayd commented Apr 30, 2014

So we could just kill this functionality, I had thought this was what all the string methods did... :s (it isn't)

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

well you can do:

try:
    data = astype('S')
except:
    data = astype('U')

(or just do the 'U' on py3)

@hayd
Copy link
Contributor Author

hayd commented Apr 30, 2014

I can't repo this on py3, I'm sure this came up for an actual bug.

@hayd hayd closed this as completed Apr 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

2 participants