make DataFrame.to_dict(orient='list') output native python elements #9108

adamgreenhall · 2014-12-19T00:24:30Z

No description provided.

shoyer · 2014-12-19T06:38:57Z

@adamgreenhall Could you please explain the motivation behind this change in a little more detail?

This will also need tests.

adamgreenhall · 2014-12-19T17:21:43Z

This change is about enabling the export of a DataFrame into a part of a json document (I know to_json exists, but want to add other things to the document as well). Currently, the lists of data created by DataFrame.to_dict(orient='list') are made up of numpy elements. This causes json.dump to raise errors if the DataFrame has np.bool types (works for np.float64 and np.int64) -- see example of the issue below.

This change fixes the immediate symptom by converting all list element values to native python types. Perhaps the underlying issue is really that np.bool types are not json compatible - but I wasn't sure how to address that.

In [1]: import pandas as pd

In [2]: import json

In [3]: df = pd.DataFrame({'a': [1.1, 1.2, 1.3], 'b': [2, 3, 4], 'c': [True, False, True]})

In [4]: print df.dtypes
a    float64
b      int64
c       bool
dtype: object

In [5]: blob = dict(data=df.to_dict(orient='list'), description='this is some data')

In [6]: print type(blob['data']['c'][0])
<type 'numpy.bool_'>

In [7]: print json.dumps(blob)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-3bd1feb8b602> in <module>()
----> 1 print json.dumps(blob)

./python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
    241         cls is None and indent is None and separators is None and
    242         encoding == 'utf-8' and default is None and not sort_keys and not kw):
--> 243         return _default_encoder.encode(obj)
    244     if cls is None:
    245         cls = JSONEncoder

./python2.7/json/encoder.pyc in encode(self, o)
    205         # exceptions aren't as detailed.  The list call should be roughly
    206         # equivalent to the PySequence_Fast that ''.join() would do.
--> 207         chunks = self.iterencode(o, _one_shot=True)
    208         if not isinstance(chunks, (list, tuple)):
    209             chunks = list(chunks)

./python2.7/json/encoder.pyc in iterencode(self, o, _one_shot)
    268                 self.key_separator, self.item_separator, self.sort_keys,
    269                 self.skipkeys, _one_shot)
--> 270         return _iterencode(o, 0)
    271
    272 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

./python2.7/json/encoder.pyc in default(self, o)
    182
    183         """
--> 184         raise TypeError(repr(o) + " is not JSON serializable")
    185
    186     def encode(self, o):

TypeError: True is not JSON serializable

cpcloud · 2014-12-19T17:30:45Z

if you don't need to do anything special with your frames you can use pandas.io.json.dumps:

In [8]: from pandas.io.json import dumps

In [9]: dumps(['a', 1, ['a', 2, [3]], {'frame': pd.DataFrame(np.random.rand(2, 2))}])
Out[9]: '["a",1,["a",2,[3]],{"frame":{"0":{"0":0.369918913,"1":0.4624219221},"1":{"0":0.9272272068,"1":0.7450566582}}}]'

cpcloud · 2014-12-19T17:31:58Z

to_dict will be much less efficient than to_json, as to_json is looping in C whereas to_dict looping in Python.

cpcloud · 2014-12-19T17:32:53Z

dumps could be exposed at the toplevel API, though I haven't thought about what additional work that might require

adamgreenhall · 2014-12-19T17:53:44Z

I like the idea of using pandas.io.json.dumps, but would also like to keep the orient='list' styling. Sounds like that would require altering it.

cpcloud · 2014-12-19T18:08:49Z

how comfortable are you with C?

cpcloud · 2014-12-19T18:11:15Z

you can also do this with dumps right now:

In [16]: import pandas.util.testing as tm

In [17]: from pandas.io.json import dumps

In [18]: df = tm.makeTimeDataFrame().reset_index().rename(columns={'index': 'date'}).head(5)

In [19]: df
Out[19]:
        date         A         B         C         D
0 2000-01-03  0.229303 -1.394965  0.156741  1.233180
1 2000-01-04 -0.611819  0.616925 -0.063782  0.455711
2 2000-01-05  2.387436  0.552139 -0.000982 -0.478749
3 2000-01-06 -0.694529  0.472475  0.924082 -1.544734
4 2000-01-07 -0.794539 -0.597034 -1.734419 -0.104073

In [21]: dumps({k: v for k, v in df.iteritems()},orient='values')
Out[21]: '{"date":[946857600000,946944000000,947030400000,947116800000,947203200000],"A":[0.2293025925,-0.611819198,2.387435883,-0.6945293878,-0.7945391792],"C":[0.1567408883,-0.0637816997,-0.0009824659,0.9240820459,-1.7344187482],"B":[-1.3949645486,0.6169250907,0.5521388533,0.4724746145,-0.5970341471],"D":[1.2331802175,0.4557113376,-0.4787493278,-1.5447336653,-0.1040725235]}'

adamgreenhall · 2014-12-19T18:27:01Z

@cpcloud - that works for me - thanks!

make DataFrame.to_dict(orient='list') output native python elements

15969c1

adamgreenhall closed this Dec 19, 2014

TomAugspurger mentioned this pull request Jul 28, 2016

Inconsistent types in output of series.to_dict() and DataFrame([series]).loc[0].to_dict() #13830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make DataFrame.to_dict(orient='list') output native python elements #9108

make DataFrame.to_dict(orient='list') output native python elements #9108

adamgreenhall commented Dec 19, 2014

shoyer commented Dec 19, 2014

adamgreenhall commented Dec 19, 2014

cpcloud commented Dec 19, 2014

cpcloud commented Dec 19, 2014

cpcloud commented Dec 19, 2014

adamgreenhall commented Dec 19, 2014

cpcloud commented Dec 19, 2014

cpcloud commented Dec 19, 2014

adamgreenhall commented Dec 19, 2014

make DataFrame.to_dict(orient='list') output native python elements #9108

make DataFrame.to_dict(orient='list') output native python elements #9108

Conversation

adamgreenhall commented Dec 19, 2014

shoyer commented Dec 19, 2014

adamgreenhall commented Dec 19, 2014

cpcloud commented Dec 19, 2014

cpcloud commented Dec 19, 2014

cpcloud commented Dec 19, 2014

adamgreenhall commented Dec 19, 2014

cpcloud commented Dec 19, 2014

cpcloud commented Dec 19, 2014

adamgreenhall commented Dec 19, 2014