[Data] Remove deprecated read_parquet_bulk API#58970
[Data] Remove deprecated read_parquet_bulk API#58970bveeramani merged 4 commits intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request effectively removes the deprecated read_parquet_bulk API, which helps reduce maintenance and prevent user confusion. The changes are comprehensive, covering the function's implementation, tests, documentation, and internal references. The code removal is clean and I only have one minor suggestion to improve the clarity of an updated docstring.
There was a problem hiding this comment.
Bug: Orphaned parametrize decorators stacked on unrelated test function
The @pytest.mark.parametrize decorators for the deleted test_parquet_read_bulk and test_parquet_read_bulk_meta_provider functions were not removed along with the functions. These orphaned decorators (lines 234-249 and 250-265) are now stacked on top of test_parquet_read_partitioned, causing that test to have three parametrize decorators instead of one. This results in the test running with a Cartesian product of parameters from all three decorators, dramatically increasing test execution time and potentially causing failures from duplicate parameter name conflicts.
python/ray/data/tests/test_parquet.py#L233-L265
ray/python/ray/data/tests/test_parquet.py
Lines 233 to 265 in bbfae94
bveeramani
left a comment
There was a problem hiding this comment.
ty for the contribution! Overall LGTM, just left a couple comments
Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com>
bbfae94 to
5619e3f
Compare
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
|
@rushikeshadhav as a follow up, would you be interested in removing |
Yes, I would love to. |
## Description >This PR removes the deprecated read_parquet_bulk API from Ray Data, along with its implementation and documentation. This function was deprecated in favor of read_parquet, which now covers all equivalent use cases. The deprecation warning stated removal after May 2025, and that deadline has passed — so this cleanup reduces maintenance burden and prevents user confusion. Summary of changes - Removed read_parquet_bulk from read_api.py and __init__.py - Deleted ParquetBulkDatasource + its file - Removed related tests and documentation - Updated references and docstrings mentioning the deprecated API ## Related issues > Fixes ray-project#58969 --------- Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
## Description >This PR removes the deprecated read_parquet_bulk API from Ray Data, along with its implementation and documentation. This function was deprecated in favor of read_parquet, which now covers all equivalent use cases. The deprecation warning stated removal after May 2025, and that deadline has passed — so this cleanup reduces maintenance burden and prevents user confusion. Summary of changes - Removed read_parquet_bulk from read_api.py and __init__.py - Deleted ParquetBulkDatasource + its file - Removed related tests and documentation - Updated references and docstrings mentioning the deprecated API ## Related issues > Fixes ray-project#58969 --------- Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
Summary of changes
Related issues