Remote Code Execution via Parquet Arrow Extension Type Deserialization
Summary
Ray Data registers custom Arrow extension types (ray.data.arrow_tensor, ray.data.arrow_tensor_v2, ray.data.arrow_variable_shaped_tensor) globally in PyArrow. When PyArrow reads a Parquet file containing one of these extension types, it calls __arrow_ext_deserialize__ on the field's metadata bytes. Ray's implementation passes these bytes directly to cloudpickle.loads(), achieving arbitrary code execution during schema parsing, before any row data is read.
In May 2024, Ray fixed a related vulnerability in PyExtensionType-based extension types (issue #41314, PR #45084). In July 2025, PR #54831 introduced cloudpickle.loads() into the replacement extension types' deserialization path, reintroducing the same class of vulnerability.
Impact
- Affected versions: Ray 2.49.0 through 2.54.0 (latest release as of March 2026). The vulnerable
_deserialize_with_fallback function with cloudpickle.loads() was introduced in commit f6d21db1a4 (PR #54831, July 2025), first released in Ray 2.49.0.
- Affected configurations: Any process that uses Ray Data and reads Parquet files. The extension types are registered globally in PyArrow, so all Parquet reads in the process are affected, including
ray.data.read_parquet(), pyarrow.parquet.read_table(), pandas.read_parquet(), etc.
- Attacker prerequisites: The attacker must place a crafted Parquet file where a Ray Data pipeline reads it. No authentication or cluster access is required. The Parquet file must contain a column with a
ray.data.arrow_tensor (or v2, or variable-shaped) extension type name, which makes this a targeted attack against Ray Data users.
- CIA impact: Arbitrary command execution as the Ray worker process user, resulting in full server compromise.
- Severity: Critical
Remote Code Execution via Parquet Arrow Extension Type Deserialization
Summary
Ray Data registers custom Arrow extension types (
ray.data.arrow_tensor,ray.data.arrow_tensor_v2,ray.data.arrow_variable_shaped_tensor) globally in PyArrow. When PyArrow reads a Parquet file containing one of these extension types, it calls__arrow_ext_deserialize__on the field's metadata bytes. Ray's implementation passes these bytes directly tocloudpickle.loads(), achieving arbitrary code execution during schema parsing, before any row data is read.In May 2024, Ray fixed a related vulnerability in
PyExtensionType-based extension types (issue #41314, PR #45084). In July 2025, PR #54831 introducedcloudpickle.loads()into the replacement extension types' deserialization path, reintroducing the same class of vulnerability.Impact
_deserialize_with_fallbackfunction withcloudpickle.loads()was introduced in commitf6d21db1a4(PR #54831, July 2025), first released in Ray 2.49.0.ray.data.read_parquet(),pyarrow.parquet.read_table(),pandas.read_parquet(), etc.ray.data.arrow_tensor(or v2, or variable-shaped) extension type name, which makes this a targeted attack against Ray Data users.