Skip to content

DataFrame.to_dict not using python types #19381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
affanshahid opened this issue Jan 24, 2018 · 4 comments
Closed

DataFrame.to_dict not using python types #19381

affanshahid opened this issue Jan 24, 2018 · 4 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@affanshahid
Copy link

I have a dataframe that I need to convert to a dict and add it to another dict which is converted to json in the end. Unfortunately since .to_dict() does not use standard python types I get the following:

TypeError: Object of type 'int32' is not JSON serializable

My code is something like the following:

output = {
    'foo': 1,
    'bar': 2
}
...
df = ...
output['baz'] = df.to_dict()

send(json.dumps(output))

Using dumps results in the above error. This is related to #13258 where an 'easy way' was mentioned but I'm not sure what it is.

Would appreciate any help

@chris-b1
Copy link
Contributor

Please make a reproducible example - in some cases at least we are returning python types. (agree we should in all)

In [210]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

In [211]: df.to_dict()
Out[211]: {'a': {0: 1, 1: 2, 2: 3}, 'b': {0: 4, 1: 5, 2: 6}}

In [213]: type(df.to_dict()['a'][0])
Out[213]: int

@affanshahid
Copy link
Author

affanshahid commented Jan 26, 2018

I am loading data from a database. I then call .describe() on a column(Series) of my df before calling .to_dict() and then attempting the json conversion. Below is a simplified example that throws the following error:
TypeError: Object of type 'int32' is not JSON serializable

import pandas as pd
import json
import datetime

data = [
    datetime.date(1987, 2, 12),
    datetime.date(1987, 2, 12),
    datetime.date(1987, 2, 12),
    None,
    None,
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15)
]

df = pd.DataFrame(columns=['foo'])
df['foo'] = data
ds = df['foo'].describe()
d = ds.to_dict()
j = json.dumps(d)
print(j)

@jreback
Copy link
Contributor

jreback commented Jan 26, 2018

show your versions this is fixed in 0.21 iirc

@chris-b1
Copy link
Contributor

Thanks for the example - this is actually a symptom / dupe of #15385 - some of the aggregations used in describe are return numpy scalars instead of python ones, which are getting passed along.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants