Skip to content

[Data] - Download expr - cast to arrow before checking column#57146

Merged
richardliaw merged 2 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/fix_download_op_arrow
Oct 3, 2025
Merged

[Data] - Download expr - cast to arrow before checking column#57146
richardliaw merged 2 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/fix_download_op_arrow

Conversation

@goutamvenkat-anyscale
Copy link
Contributor

Why are these changes needed?

Call to_arrow() before invoking .column_names

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run pre-commit jobs to lint the changes in this PR. (pre-commit setup)
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Goutam V. <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner October 2, 2025 23:45
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the download expression logic where an AttributeError would occur if the input block was not a PyArrow Table. The change moves the block-to-Arrow conversion to before the column name check, ensuring that the .column_names attribute is always available. The fix is correct and improves the robustness of the download functionality. I've added a suggestion to include a unit test that specifically covers the fixed scenario with non-Arrow blocks to prevent future regressions.

return file_sizes

def __call__(self, block: pa.Table) -> Iterator[pa.Table]:
if not isinstance(block, pa.Table):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to have a unit test that can catch this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added UT

@goutamvenkat-anyscale goutamvenkat-anyscale added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Oct 3, 2025
Copy link
Contributor

@iamjustinhsu iamjustinhsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Goutam V. <goutam@anyscale.com>
@richardliaw richardliaw merged commit cca4e4c into ray-project:master Oct 3, 2025
6 checks passed
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants