Skip to content

[Data] - Update Pyarrow version to 23.0 for release tests + Update moto to 5.x.x#59489

Merged
bveeramani merged 21 commits intoray-project:masterfrom
goutamvenkat-anyscale:pa-rel-tst-up
Jan 30, 2026
Merged

[Data] - Update Pyarrow version to 23.0 for release tests + Update moto to 5.x.x#59489
bveeramani merged 21 commits intoray-project:masterfrom
goutamvenkat-anyscale:pa-rel-tst-up

Conversation

@goutamvenkat-anyscale
Copy link
Contributor

Description

PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses:

Expected: b'spam'
Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n'

Related issue from moto: getmoto/moto#7198

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

alexeykudinkin and others added 2 commits December 15, 2025 17:44
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale marked this pull request as ready for review December 17, 2025 00:45
@goutamvenkat-anyscale goutamvenkat-anyscale added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Dec 17, 2025
llvmlite==0.42.0
lxml>=6.0.2
moto[s3,server]==4.2.12
moto[s3,server]==5.1.18
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the related dependency set.

cc @elliot-barn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea you don't need to update requirements_compiled.txt manually. You can run these scripts in python 3.11 environment ci/ci.sh compile_pip_dependencies && bazel run //ci/raydepsets:raydepsets -- build --all-configs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on, @goutamvenkat-anyscale would new moto be compatible w/ PA < 22?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's compatible.
Tested with this script:

import boto3
from moto import mock_aws

@mock_aws
def test_s3_put_get():
    s3 = boto3.client("s3", region_name="us-east-1")
    s3.create_bucket(Bucket="test-bucket")
    
    # Test basic put/get
    s3.put_object(Bucket="test-bucket", Key="test.txt", Body=b"spam")
    resp = s3.get_object(Bucket="test-bucket", Key="test.txt")
    body = resp["Body"].read()
    
    print(f"Expected: b'spam'")
    print(f"Got:      {body!r}")
    assert body == b"spam", f"Mismatch! Got {body!r}"
    print("✓ Basic test passed!")
    
    # Test with larger payload (more likely to trigger chunked encoding)
    large_body = b"x" * 10_000
    s3.put_object(Bucket="test-bucket", Key="large.bin", Body=large_body)
    resp = s3.get_object(Bucket="test-bucket", Key="large.bin")
    result = resp["Body"].read()
    
    assert result == large_body, f"Large payload mismatch! Got {len(result)} bytes"
    print("✓ Large payload test passed!")

if __name__ == "__main__":
    import pyarrow
    import moto
    print(f"pyarrow: {pyarrow.__version__}")
    print(f"moto:    {moto.__version__}")
    print()
    test_s3_put_get()

Output:

pyarrow: 21.0.0
moto:    5.1.20

Expected: b'spam'
Got:      b'spam'

@aslonnie aslonnie requested a review from elliot-barn December 17, 2025 00:56
Signed-off-by: Goutam <goutam@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates PyArrow to version 22 and moto to version 5.x to resolve an incompatibility issue with S3 requests using HTTP chunked transfer encoding. The changes include updating CI configuration labels, Docker build arguments, and Python dependency versions. The mock server setup scripts are also adjusted to accommodate breaking changes in moto v5. The changes are logical and directly address the issue described. I've added a couple of suggestions to improve code maintainability by removing now-unused function parameters.

args = [moto_svr_path, service_name, "-H", host, "-p", str(port)]
# moto 5.x no longer accepts a service name argument - all services
# are served on a single endpoint
args = [moto_svr_path, "-H", host, "-p", str(port)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With this change, the service_name parameter is no longer used to configure the moto_server. It's only used in error messages on lines 81 and 91. Since moto v5 serves all services on a single endpoint, the concept of starting a single service is obsolete. Consider removing the service_name parameter from the function signature and updating the error messages to something more generic like "Can not start moto server". This would require updating the call site on line 124 as well.

args = [moto_svr_path, service_name, "-H", host, "-p", str(port)]
# moto 5.x no longer accepts a service name argument - all services
# are served on a single endpoint
args = [moto_svr_path, "-H", host, "-p", str(port)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With this change, the service_name parameter is no longer used to configure moto_server. It is only used in an error message on line 36. Since moto v5 serves all services on a single endpoint, the parameter is misleading. It would be best to remove service_name from the function signature and update the error message to be more generic (e.g., "Can not start moto server"). This would also require updating the call sites on lines 70, 80, and 90.

Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 24, 2026
Signed-off-by: Goutam <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - Update Pyarrow version to 22.0 for release tests + Update moto to 5.x.x [Data] - Update Pyarrow version to 23.0 for release tests + Update moto to 5.x.x Jan 28, 2026
@goutamvenkat-anyscale goutamvenkat-anyscale removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 28, 2026
Signed-off-by: Goutam <goutam@anyscale.com>
@bveeramani bveeramani merged commit b3f91fc into ray-project:master Jan 30, 2026
6 checks passed
mosaicml==0.3.1 ; python_version < "3.12"
# via -r python/requirements/ml/train-test-requirements.txt
moto==4.2.12
moto==5.1.18
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elliot-barn , could you help update python3.13 requirements to be consistent with the changes in this PR too?

liulehui pushed a commit to liulehui/ray that referenced this pull request Jan 31, 2026
…to to 5.x.x (ray-project#59489)

## Description

PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked
transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our
old moto version (4.2.12) doesn't properly parse this protocol, causing
raw HTTP wire format to leak into test responses:

```
Expected: b'spam'
Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n'
```

Related issue from moto: getmoto/moto#7198


## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 1, 2026
…to to 5.x.x (ray-project#59489)

## Description

PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked
transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our
old moto version (4.2.12) doesn't properly parse this protocol, causing
raw HTTP wire format to leak into test responses:

```
Expected: b'spam'
Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n'
```

Related issue from moto: getmoto/moto#7198

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: 400Ping <jiekaichang@apache.org>
rayhhome pushed a commit to rayhhome/ray that referenced this pull request Feb 4, 2026
…to to 5.x.x (ray-project#59489)

## Description

PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked
transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our
old moto version (4.2.12) doesn't properly parse this protocol, causing
raw HTTP wire format to leak into test responses:

```
Expected: b'spam'
Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n'
```

Related issue from moto: getmoto/moto#7198

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
aslonnie pushed a commit that referenced this pull request Feb 4, 2026
## Description
These 2 files were left out of this PR:
#59489

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale deleted the pa-rel-tst-up branch February 4, 2026 17:38
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
## Description
These 2 files were left out of this PR:
ray-project#59489

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
## Description
These 2 files were left out of this PR:
ray-project#59489

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
## Description
These 2 files were left out of this PR:
ray-project#59489

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Sparks0219 pushed a commit to Sparks0219/ray that referenced this pull request Feb 9, 2026
## Description
These 2 files were left out of this PR:
ray-project#59489

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
…to to 5.x.x (#59489)

## Description

PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked
transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our
old moto version (4.2.12) doesn't properly parse this protocol, causing
raw HTTP wire format to leak into test responses:

```
Expected: b'spam'
Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n'
```

Related issue from moto: getmoto/moto#7198


## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
## Description
These 2 files were left out of this PR:
#59489

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
## Description
These 2 files were left out of this PR:
#59489

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…to to 5.x.x (ray-project#59489)

## Description

PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked
transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our
old moto version (4.2.12) doesn't properly parse this protocol, causing
raw HTTP wire format to leak into test responses:

```
Expected: b'spam'
Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n'
```

Related issue from moto: getmoto/moto#7198

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
## Description
These 2 files were left out of this PR:
ray-project#59489

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
## Description
These 2 files were left out of this PR:
ray-project#59489

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…to to 5.x.x (ray-project#59489)

## Description

PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked
transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our
old moto version (4.2.12) doesn't properly parse this protocol, causing
raw HTTP wire format to leak into test responses:

```
Expected: b'spam'
Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n'
```

Related issue from moto: getmoto/moto#7198

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
## Description
These 2 files were left out of this PR:
ray-project#59489

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…to to 5.x.x (ray-project#59489)

## Description

PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked
transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our
old moto version (4.2.12) doesn't properly parse this protocol, causing
raw HTTP wire format to leak into test responses:

```
Expected: b'spam'
Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n'
```

Related issue from moto: getmoto/moto#7198

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
## Description
These 2 files were left out of this PR:
ray-project#59489

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

6 participants