Skip to content

ValueError when reading JSON lines file #30716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danijar opened this issue Jan 5, 2020 · 2 comments
Closed

ValueError when reading JSON lines file #30716

danijar opened this issue Jan 5, 2020 · 2 comments

Comments

@danijar
Copy link

danijar commented Jan 5, 2020

Overview

Using pandas==0.25.1 with Python 3.7.1 on Debian, loading the following JSON lines file fails using pandas.read_json() but succeeds when read manually.

After looking into this a bit, I think it might be related to NaN in the JSON file which is not supported by the spec but accepted by json.loads(). If that turns out to be the case, it would be good to have an option to ignore those entries or at least provide a detailed error message.

Data file: https://gist.github.com/danijar/37ba75a6991d61de9e77755329bb5ef4

Manual

Reading the file manually using json.loads() and passing it to a pd.DataFrame works fine:

import json
import pandas as pd
with open(filename) as f:
  df = pd.DataFrame([json.loads(l) for l in f.readlines()])
print(df)  # Shows data frame as expected
Terminal output
       step  train/return  train/length  episodes  ...  value_loss  action_loss  action_ent        fps
0      1000           1.0         500.0       1.0  ...         NaN          NaN         NaN        NaN
1      2000           0.0         500.0       2.0  ...         NaN          NaN         NaN        NaN
2      3000         163.0         500.0       3.0  ...         NaN          NaN         NaN        NaN
3      4000           0.0         500.0       4.0  ...         NaN          NaN         NaN        NaN
4      5000           0.0         500.0       5.0  ...         NaN          NaN         NaN        NaN
..      ...           ...           ...       ...  ...         ...          ...         ...        ...
798  383000           0.0         500.0     383.0  ...         NaN          NaN         NaN        NaN
799  383000           NaN           NaN       NaN  ...         NaN          NaN         NaN  19.500059
800  384000           0.0         500.0     384.0  ...         NaN          NaN         NaN        NaN
801  384000           NaN           NaN       NaN  ...         NaN          NaN         NaN  19.608651
802  385000        1000.0         500.0     385.0  ...         NaN          NaN         NaN        NaN

[803 rows x 19 columns]

Pandas

But reading the same file with pandas.read_json() fails with an Pandas internal error:

import pandas as pd
df = pd.read_json(filename, lines=True)  # ValueError: Expected object or value
Terminal output
<path-to-python3.7>/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    590         return json_reader
    591
--> 592     result = json_reader.read()
    593     if should_close:
    594         try:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in read(self)
    713         elif self.lines:
    714             data = ensure_str(self.data)
--> 715             obj = self._get_object_parser(self._combine_lines(data.split("\n")))
    716         else:
    717             obj = self._get_object_parser(self.data)

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    737         obj = None
    738         if typ == "frame":
--> 739             obj = FrameParser(json, **kwargs).parse()
    740
    741         if typ == "series" or obj is None:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in parse(self)
    847
    848         else:
--> 849             self._parse_no_numpy()
    850
    851         if self.obj is None:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1091         if orient == "columns":
   1092             self.obj = DataFrame(
-> 1093                 loads(json, precise_float=self.precise_float), dtype=None
   1094             )
   1095         elif orient == "split":

ValueError: Expected object or value
@simongibbons
Copy link
Contributor

simongibbons commented Jan 6, 2020

The latest master will parse and load the given file fine as a fix has recently been merged to allow parsing NaN values from json. (cc #30295 )

@danijar
Copy link
Author

danijar commented Jan 6, 2020

Thanks!

@danijar danijar closed this as completed Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants