ValueError when reading JSON lines file #30716

danijar · 2020-01-05T22:21:20Z

Overview

Using pandas==0.25.1 with Python 3.7.1 on Debian, loading the following JSON lines file fails using pandas.read_json() but succeeds when read manually.

After looking into this a bit, I think it might be related to NaN in the JSON file which is not supported by the spec but accepted by json.loads(). If that turns out to be the case, it would be good to have an option to ignore those entries or at least provide a detailed error message.

Data file: https://gist.github.com/danijar/37ba75a6991d61de9e77755329bb5ef4

Manual

Reading the file manually using json.loads() and passing it to a pd.DataFrame works fine:

import json
import pandas as pd
with open(filename) as f:
  df = pd.DataFrame([json.loads(l) for l in f.readlines()])
print(df)  # Shows data frame as expected

Terminal output

       step  train/return  train/length  episodes  ...  value_loss  action_loss  action_ent        fps
0      1000           1.0         500.0       1.0  ...         NaN          NaN         NaN        NaN
1      2000           0.0         500.0       2.0  ...         NaN          NaN         NaN        NaN
2      3000         163.0         500.0       3.0  ...         NaN          NaN         NaN        NaN
3      4000           0.0         500.0       4.0  ...         NaN          NaN         NaN        NaN
4      5000           0.0         500.0       5.0  ...         NaN          NaN         NaN        NaN
..      ...           ...           ...       ...  ...         ...          ...         ...        ...
798  383000           0.0         500.0     383.0  ...         NaN          NaN         NaN        NaN
799  383000           NaN           NaN       NaN  ...         NaN          NaN         NaN  19.500059
800  384000           0.0         500.0     384.0  ...         NaN          NaN         NaN        NaN
801  384000           NaN           NaN       NaN  ...         NaN          NaN         NaN  19.608651
802  385000        1000.0         500.0     385.0  ...         NaN          NaN         NaN        NaN

[803 rows x 19 columns]

Pandas

But reading the same file with pandas.read_json() fails with an Pandas internal error:

import pandas as pd
df = pd.read_json(filename, lines=True)  # ValueError: Expected object or value

Terminal output

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    590         return json_reader
    591
--> 592     result = json_reader.read()
    593     if should_close:
    594         try:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in read(self)
    713         elif self.lines:
    714             data = ensure_str(self.data)
--> 715             obj = self._get_object_parser(self._combine_lines(data.split("\n")))
    716         else:
    717             obj = self._get_object_parser(self.data)

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    737         obj = None
    738         if typ == "frame":
--> 739             obj = FrameParser(json, **kwargs).parse()
    740
    741         if typ == "series" or obj is None:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in parse(self)
    847
    848         else:
--> 849             self._parse_no_numpy()
    850
    851         if self.obj is None:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1091         if orient == "columns":
   1092             self.obj = DataFrame(
-> 1093                 loads(json, precise_float=self.precise_float), dtype=None
   1094             )
   1095         elif orient == "split":

ValueError: Expected object or value

The text was updated successfully, but these errors were encountered:

simongibbons · 2020-01-06T09:13:11Z

The latest master will parse and load the given file fine as a fix has recently been merged to allow parsing NaN values from json. (cc #30295 )

danijar · 2020-01-06T17:58:04Z

Thanks!

danijar closed this as completed Jan 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError when reading JSON lines file #30716

ValueError when reading JSON lines file #30716

danijar commented Jan 5, 2020 •

edited

Loading

simongibbons commented Jan 6, 2020 •

edited

Loading

danijar commented Jan 6, 2020

ValueError when reading JSON lines file #30716

ValueError when reading JSON lines file #30716

Comments

danijar commented Jan 5, 2020 • edited Loading

Overview

Manual

Pandas

simongibbons commented Jan 6, 2020 • edited Loading

danijar commented Jan 6, 2020

danijar commented Jan 5, 2020 •

edited

Loading

simongibbons commented Jan 6, 2020 •

edited

Loading