[Data] - Update Pyarrow version to 23.0 for release tests + Update moto to 5.x.x#59489
Conversation
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
| llvmlite==0.42.0 | ||
| lxml>=6.0.2 | ||
| moto[s3,server]==4.2.12 | ||
| moto[s3,server]==5.1.18 |
There was a problem hiding this comment.
please update the related dependency set.
cc @elliot-barn
There was a problem hiding this comment.
yea you don't need to update requirements_compiled.txt manually. You can run these scripts in python 3.11 environment ci/ci.sh compile_pip_dependencies && bazel run //ci/raydepsets:raydepsets -- build --all-configs
There was a problem hiding this comment.
Hold on, @goutamvenkat-anyscale would new moto be compatible w/ PA < 22?
There was a problem hiding this comment.
It's compatible.
Tested with this script:
import boto3
from moto import mock_aws
@mock_aws
def test_s3_put_get():
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket="test-bucket")
# Test basic put/get
s3.put_object(Bucket="test-bucket", Key="test.txt", Body=b"spam")
resp = s3.get_object(Bucket="test-bucket", Key="test.txt")
body = resp["Body"].read()
print(f"Expected: b'spam'")
print(f"Got: {body!r}")
assert body == b"spam", f"Mismatch! Got {body!r}"
print("✓ Basic test passed!")
# Test with larger payload (more likely to trigger chunked encoding)
large_body = b"x" * 10_000
s3.put_object(Bucket="test-bucket", Key="large.bin", Body=large_body)
resp = s3.get_object(Bucket="test-bucket", Key="large.bin")
result = resp["Body"].read()
assert result == large_body, f"Large payload mismatch! Got {len(result)} bytes"
print("✓ Large payload test passed!")
if __name__ == "__main__":
import pyarrow
import moto
print(f"pyarrow: {pyarrow.__version__}")
print(f"moto: {moto.__version__}")
print()
test_s3_put_get()
Output:
pyarrow: 21.0.0
moto: 5.1.20
Expected: b'spam'
Got: b'spam'
There was a problem hiding this comment.
Code Review
This pull request updates PyArrow to version 22 and moto to version 5.x to resolve an incompatibility issue with S3 requests using HTTP chunked transfer encoding. The changes include updating CI configuration labels, Docker build arguments, and Python dependency versions. The mock server setup scripts are also adjusted to accommodate breaking changes in moto v5. The changes are logical and directly address the issue described. I've added a couple of suggestions to improve code maintainability by removing now-unused function parameters.
| args = [moto_svr_path, service_name, "-H", host, "-p", str(port)] | ||
| # moto 5.x no longer accepts a service name argument - all services | ||
| # are served on a single endpoint | ||
| args = [moto_svr_path, "-H", host, "-p", str(port)] |
There was a problem hiding this comment.
With this change, the service_name parameter is no longer used to configure the moto_server. It's only used in error messages on lines 81 and 91. Since moto v5 serves all services on a single endpoint, the concept of starting a single service is obsolete. Consider removing the service_name parameter from the function signature and updating the error messages to something more generic like "Can not start moto server". This would require updating the call site on line 124 as well.
| args = [moto_svr_path, service_name, "-H", host, "-p", str(port)] | ||
| # moto 5.x no longer accepts a service name argument - all services | ||
| # are served on a single endpoint | ||
| args = [moto_svr_path, "-H", host, "-p", str(port)] |
There was a problem hiding this comment.
With this change, the service_name parameter is no longer used to configure moto_server. It is only used in an error message on line 36. Since moto v5 serves all services on a single endpoint, the parameter is misleading. It would be best to remove service_name from the function signature and update the error message to be more generic (e.g., "Can not start moto server"). This would also require updating the call sites on lines 70, 80, and 90.
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Signed-off-by: Goutam <goutam@anyscale.com>
| mosaicml==0.3.1 ; python_version < "3.12" | ||
| # via -r python/requirements/ml/train-test-requirements.txt | ||
| moto==4.2.12 | ||
| moto==5.1.18 |
There was a problem hiding this comment.
@elliot-barn , could you help update python3.13 requirements to be consistent with the changes in this PR too?
…to to 5.x.x (ray-project#59489) ## Description PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses: ``` Expected: b'spam' Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n' ``` Related issue from moto: getmoto/moto#7198 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
…to to 5.x.x (ray-project#59489) ## Description PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses: ``` Expected: b'spam' Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n' ``` Related issue from moto: getmoto/moto#7198 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Co-authored-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: 400Ping <jiekaichang@apache.org>
…to to 5.x.x (ray-project#59489) ## Description PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses: ``` Expected: b'spam' Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n' ``` Related issue from moto: getmoto/moto#7198 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Co-authored-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
## Description These 2 files were left out of this PR: #59489 ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com>
## Description These 2 files were left out of this PR: ray-project#59489 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
## Description These 2 files were left out of this PR: ray-project#59489 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
## Description These 2 files were left out of this PR: ray-project#59489 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com>
## Description These 2 files were left out of this PR: ray-project#59489 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com>
…to to 5.x.x (#59489) ## Description PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses: ``` Expected: b'spam' Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n' ``` Related issue from moto: getmoto/moto#7198 ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Co-authored-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
## Description These 2 files were left out of this PR: #59489 ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
## Description These 2 files were left out of this PR: #59489 ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com>
…to to 5.x.x (ray-project#59489) ## Description PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses: ``` Expected: b'spam' Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n' ``` Related issue from moto: getmoto/moto#7198 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Co-authored-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>
## Description These 2 files were left out of this PR: ray-project#59489 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>
## Description These 2 files were left out of this PR: ray-project#59489 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com>
…to to 5.x.x (ray-project#59489) ## Description PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses: ``` Expected: b'spam' Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n' ``` Related issue from moto: getmoto/moto#7198 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Co-authored-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
## Description These 2 files were left out of this PR: ray-project#59489 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
…to to 5.x.x (ray-project#59489) ## Description PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses: ``` Expected: b'spam' Got: b'4\r\nspam\r\n0\r\nx-amz-checksum-crc64nvme:...\r\n\r\n' ``` Related issue from moto: getmoto/moto#7198 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Co-authored-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
## Description These 2 files were left out of this PR: ray-project#59489 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
PyArrow 22 uses a newer AWS SDK that sends S3 requests with HTTP chunked transfer encoding and trailer checksums (x-amz-checksum-crc64nvme). Our old moto version (4.2.12) doesn't properly parse this protocol, causing raw HTTP wire format to leak into test responses:
Related issue from moto: getmoto/moto#7198
Related issues
Additional information