You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using pandas==0.25.1 with Python 3.7.1 on Debian, loading the following JSON lines file fails using pandas.read_json() but succeeds when read manually.
After looking into this a bit, I think it might be related to NaN in the JSON file which is not supported by the spec but accepted by json.loads(). If that turns out to be the case, it would be good to have an option to ignore those entries or at least provide a detailed error message.
Reading the file manually using json.loads() and passing it to a pd.DataFrame works fine:
importjsonimportpandasaspdwithopen(filename) asf:
df=pd.DataFrame([json.loads(l) forlinf.readlines()])
print(df) # Shows data frame as expected
Terminal output
step train/return train/length episodes ... value_loss action_loss action_ent fps
0 1000 1.0 500.0 1.0 ... NaN NaN NaN NaN
1 2000 0.0 500.0 2.0 ... NaN NaN NaN NaN
2 3000 163.0 500.0 3.0 ... NaN NaN NaN NaN
3 4000 0.0 500.0 4.0 ... NaN NaN NaN NaN
4 5000 0.0 500.0 5.0 ... NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ...
798 383000 0.0 500.0 383.0 ... NaN NaN NaN NaN
799 383000 NaN NaN NaN ... NaN NaN NaN 19.500059
800 384000 0.0 500.0 384.0 ... NaN NaN NaN NaN
801 384000 NaN NaN NaN ... NaN NaN NaN 19.608651
802 385000 1000.0 500.0 385.0 ... NaN NaN NaN NaN
[803 rows x 19 columns]
Pandas
But reading the same file with pandas.read_json() fails with an Pandas internal error:
importpandasaspddf=pd.read_json(filename, lines=True) # ValueError: Expected object or value
Terminal output
<path-to-python3.7>/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
590 return json_reader
591
--> 592 result = json_reader.read()
593 if should_close:
594 try:
<path-to-python3.7>/site-packages/pandas/io/json/_json.py in read(self)
713 elif self.lines:
714 data = ensure_str(self.data)
--> 715 obj = self._get_object_parser(self._combine_lines(data.split("\n")))
716 else:
717 obj = self._get_object_parser(self.data)
<path-to-python3.7>/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
737 obj = None
738 if typ == "frame":
--> 739 obj = FrameParser(json, **kwargs).parse()
740
741 if typ == "series" or obj is None:
<path-to-python3.7>/site-packages/pandas/io/json/_json.py in parse(self)
847
848 else:
--> 849 self._parse_no_numpy()
850
851 if self.obj is None:
<path-to-python3.7>/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
1091 if orient == "columns":
1092 self.obj = DataFrame(
-> 1093 loads(json, precise_float=self.precise_float), dtype=None
1094 )
1095 elif orient == "split":
ValueError: Expected object or value
The text was updated successfully, but these errors were encountered:
Overview
Using
pandas==0.25.1
withPython 3.7.1
on Debian, loading the following JSON lines file fails usingpandas.read_json()
but succeeds when read manually.After looking into this a bit, I think it might be related to
NaN
in the JSON file which is not supported by the spec but accepted byjson.loads()
. If that turns out to be the case, it would be good to have an option to ignore those entries or at least provide a detailed error message.Data file: https://gist.github.com/danijar/37ba75a6991d61de9e77755329bb5ef4
Manual
Reading the file manually using
json.loads()
and passing it to apd.DataFrame
works fine:Terminal output
Pandas
But reading the same file with
pandas.read_json()
fails with an Pandas internal error:Terminal output
The text was updated successfully, but these errors were encountered: