bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318

timsaucer · 2025-03-21T11:58:48Z

Which issue does this PR close?

Closes Round trip encoding of list of fixed list fails when offset is not zero #7315.

Rationale for this change

This bug occurs when these conditions are all met:

We have a generic list of fixed size list
The offset of the outer generic list is non-zero
We writing out (serializing) the arrays

What changes are included in this PR?

This PR checks for the case of a fixed sized list and adjusts the offsets to the child arrays during write.

Unit tests added.

Are there any user-facing changes?

None

alamb

Thank you @timsaucer -- this looks great to me

I think it would be better with a test for nullable data as well (to make sure the nulls are sliced correctly) but in my opinion this PR is better than what is on main so it could be merged in as is as well

I had some small other comments, but nothing required in my opinion

alamb · 2025-03-21T19:29:33Z

arrow-ipc/src/writer.rs

@@ -3075,4 +3098,111 @@ mod tests {
        assert_eq!(stream_bytes_written_on_flush, expected_stream_flushed_bytes);
        assert_eq!(file_bytes_written_on_flush, expected_file_flushed_bytes);
    }
+
+    #[test]
+    fn test_roundtrip_list_of_fixed_list() -> Result<(), ArrowError> {


I verified that this test fails without the code in this PR

assertion `left == right` failed left: RecordBatch { schema: Schema { fields: [Field { name: "points", data_type: List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [ListArray [ FixedSizeListArray<3> [ PrimitiveArray<Float32> [ 10.0, 11.0, 12.0, ], ], ]], row_count: 1 } right: RecordBatch { schema: Schema { fields: [Field { name: "points", data_type: List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [ListArray [ FixedSizeListArray<3> [ PrimitiveArray<Float32> [ 1.0, 2.0, 3.0, ], ], ]], row_count: 1 } <Click to see difference> thread 'writer::tests::test_roundtrip_list_of_fixed_list' panicked at arrow-ipc/src/writer.rs:3150:9: assertion `left == right` failed left: RecordBatch { schema: Schema { fields: [Field { name: "points", data_type: List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [ListArray [ FixedSizeListArray<3> [ PrimitiveArray<Float32> [ 10.0, 11.0, 12.0, ], ], ]], row_count: 1 } right: RecordBatch { schema: Schema { fields: [Field { name: "points", data_type: List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [ListArray [ FixedSizeListArray<3> [ PrimitiveArray<Float32> [ 1.0, 2.0, 3.0, ], ], ]], row_count: 1 } stack backtrace: 0: rust_begin_unwind at /rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:692:5 1: core::panicking::panic_fmt at /rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panicking.rs:75:14 2: core::panicking::assert_failed_inner 3: core::panicking::assert_failed at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/panicking.rs:364:5 4: arrow_ipc::writer::tests::test_subarray at ./src/writer.rs:3150:9 5: arrow_ipc::writer::tests::test_roundtrip_list_of_fixed_list at ./src/writer.rs:3127:9 6: arrow_ipc::writer::tests::test_roundtrip_list_of_fixed_list::{{closure}} at ./src/writer.rs:3083:47 7: core::ops::function::FnOnce::call_once at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5 8: core::ops::function::FnOnce::call_once at /rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

arrow-ipc/src/writer.rs

timsaucer · 2025-03-21T19:52:40Z

Thanks @alamb . I've added the requested changes. I appreciate the rapid response.

timsaucer · 2025-03-21T20:15:25Z

Ah, I don't have merge approval for this repository since I'm not a committer on Arrow.

viirya · 2025-03-23T19:29:22Z

arrow-ipc/src/writer.rs

+        let point = [Some(10.), None, None];
+        for p in point {
+            match p {
+                Some(p) => l2_builder.values().values().append_value(p),
+                None => l2_builder.values().values().append_null(),
+            }
+        }
+
+        l2_builder.values().append(true);
+        l2_builder.append(true);


Hmm, I wonder why this point [Some(10.), None, None] needs to be separately appended? Cannot it be in the above loop too?

To make this entry a different row of the outer list.

arrow-ipc/src/writer.rs

viirya · 2025-03-23T19:31:15Z

arrow-ipc/src/writer.rs

+    }
+
+    #[test]
+    fn test_roundtrip_fixed_list() -> Result<(), ArrowError> {


Can we add another test with null value too?

arrow-ipc/src/writer.rs

viirya · 2025-03-24T16:54:17Z

Thanks @timsaucer @alamb

emilk · 2025-03-25T16:57:24Z

I would very much like to see this in a bugfix release 🙏

alamb · 2025-03-25T17:26:42Z

I would very much like to see this in a bugfix release 🙏

Thanks, i filed a ticket to track the idea. @timsaucer also was interested I think

Release arrow-rs / parquet bug fix version 54.3.1 (Mar 2025) #7330

…nd non-zero start offset (apache#7318) * When serializing fixed length arrays, adjust the offsets for writing out * Add unit test * clippy warnings * Add unit test for nulls * Update unit test to account for which schema had nulls

* bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset (#7318) * When serializing fixed length arrays, adjust the offsets for writing out * Add unit test * clippy warnings * Add unit test for nulls * Update unit test to account for which schema had nulls * Add missing type annotation (#7326) * Update version * Create changelog --------- Co-authored-by: Tim Saucer <[email protected]>

* bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset (#7318) * When serializing fixed length arrays, adjust the offsets for writing out * Add unit test * clippy warnings * Add unit test for nulls * Update unit test to account for which schema had nulls * Add missing type annotation (#7326) * Update version * Create changelog --------- Co-authored-by: Matthijs Brobbel <[email protected]>

timsaucer added 2 commits March 21, 2025 07:34

When serializing fixed length arrays, adjust the offsets for writing out

90a1016

Add unit test

1014c9f

github-actions bot added the arrow Changes to the arrow crate label Mar 21, 2025

timsaucer mentioned this pull request Mar 21, 2025

Rerun PRs rerun-io/opensource#2

Open

clippy warnings

1eb654e

alamb approved these changes Mar 21, 2025

View reviewed changes

Add unit test for nulls

a98f3bc

viirya reviewed Mar 23, 2025

View reviewed changes

arrow-ipc/src/writer.rs Outdated Show resolved Hide resolved

viirya reviewed Mar 23, 2025

View reviewed changes

arrow-ipc/src/writer.rs Show resolved Hide resolved

viirya approved these changes Mar 23, 2025

View reviewed changes

Update unit test to account for which schema had nulls

34b204a

viirya approved these changes Mar 24, 2025

View reviewed changes

viirya merged commit 3b90fc9 into apache:main Mar 24, 2025
26 checks passed

alamb mentioned this pull request Mar 25, 2025

Release arrow-rs / parquet bug fix version 54.3.1 (Mar 2025) #7330

Closed

4 tasks

mbrobbel mentioned this pull request Mar 25, 2025

Backports, version bump and changelog for 54.3.1 #7331

Merged

timsaucer mentioned this pull request Mar 26, 2025

Merge changelog and version from 54.3.1 into main #7340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318

bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318

Uh oh!

timsaucer commented Mar 21, 2025 •

edited by alamb

Loading

Uh oh!

alamb left a comment

Uh oh!

alamb Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

timsaucer commented Mar 21, 2025

Uh oh!

timsaucer commented Mar 21, 2025

Uh oh!

viirya Mar 23, 2025

Uh oh!

timsaucer Mar 24, 2025

Uh oh!

Uh oh!

viirya Mar 23, 2025

Uh oh!

Uh oh!

Uh oh!

viirya commented Mar 24, 2025

Uh oh!

emilk commented Mar 25, 2025

Uh oh!

alamb commented Mar 25, 2025

Uh oh!

Uh oh!

bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318

bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318

Uh oh!

Conversation

timsaucer commented Mar 21, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

timsaucer commented Mar 21, 2025

Uh oh!

timsaucer commented Mar 21, 2025

Uh oh!

viirya Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

timsaucer Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

viirya Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

viirya commented Mar 24, 2025

Uh oh!

emilk commented Mar 25, 2025

Uh oh!

alamb commented Mar 25, 2025

Uh oh!

Uh oh!

timsaucer commented Mar 21, 2025 •

edited by alamb

Loading