Skip to content

[Data] Add read_json() fallback to json.load() when pyarrow read fails#42558

Merged
c21 merged 10 commits intoray-project:masterfrom
scottjlee:0119-json-fallback
Jan 23, 2024
Merged

[Data] Add read_json() fallback to json.load() when pyarrow read fails#42558
c21 merged 10 commits intoray-project:masterfrom
scottjlee:0119-json-fallback

Conversation

@scottjlee
Copy link
Contributor

@scottjlee scottjlee commented Jan 21, 2024

Why are these changes needed?

When pyarrow.json fails to read in a JSON file for whatever reason, we add a fallback to reading in the bytes of the file, then using json.load() to parse the JSON, so that we can better support JSON files.

Related issue number

Closes #42516

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Scott Lee and others added 8 commits January 20, 2024 23:17
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@scottjlee scottjlee marked this pull request as ready for review January 23, 2024 00:31
Copy link
Contributor

@omatthew98 omatthew98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

Scott Lee added 2 commits January 23, 2024 10:13
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@c21 c21 merged commit f2dfae9 into ray-project:master Jan 23, 2024
khluu pushed a commit to khluu/ray that referenced this pull request Jan 24, 2024
…fails (ray-project#42558)

When `pyarrow.json` fails to read in a JSON file for whatever reason, we add a fallback to reading in the bytes of the file, then using `json.load()` to parse the JSON, so that we can better support JSON files.

Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: khluu <khluu000@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Fallback to read JSON file as binary file, and convert it with Python json in memory

3 participants