-
Notifications
You must be signed in to change notification settings - Fork 933
bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @timsaucer -- this looks great to me
I think it would be better with a test for nullable data as well (to make sure the nulls are sliced correctly) but in my opinion this PR is better than what is on main
so it could be merged in as is as well
I had some small other comments, but nothing required in my opinion
@@ -3075,4 +3098,111 @@ mod tests { | |||
assert_eq!(stream_bytes_written_on_flush, expected_stream_flushed_bytes); | |||
assert_eq!(file_bytes_written_on_flush, expected_file_flushed_bytes); | |||
} | |||
|
|||
#[test] | |||
fn test_roundtrip_list_of_fixed_list() -> Result<(), ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that this test fails without the code in this PR
assertion `left == right` failed
left: RecordBatch { schema: Schema { fields: [Field { name: "points", data_type: List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [ListArray
[
FixedSizeListArray<3>
[
PrimitiveArray<Float32>
[
10.0,
11.0,
12.0,
],
],
]], row_count: 1 }
right: RecordBatch { schema: Schema { fields: [Field { name: "points", data_type: List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [ListArray
[
FixedSizeListArray<3>
[
PrimitiveArray<Float32>
[
1.0,
2.0,
3.0,
],
],
]], row_count: 1 }
<Click to see difference>
thread 'writer::tests::test_roundtrip_list_of_fixed_list' panicked at arrow-ipc/src/writer.rs:3150:9:
assertion `left == right` failed
left: RecordBatch { schema: Schema { fields: [Field { name: "points", data_type: List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [ListArray
[
FixedSizeListArray<3>
[
PrimitiveArray<Float32>
[
10.0,
11.0,
12.0,
],
],
]], row_count: 1 }
right: RecordBatch { schema: Schema { fields: [Field { name: "points", data_type: List(Field { name: "item", data_type: FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [ListArray
[
FixedSizeListArray<3>
[
PrimitiveArray<Float32>
[
1.0,
2.0,
3.0,
],
],
]], row_count: 1 }
stack backtrace:
0: rust_begin_unwind
at /rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:692:5
1: core::panicking::panic_fmt
at /rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panicking.rs:75:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/panicking.rs:364:5
4: arrow_ipc::writer::tests::test_subarray
at ./src/writer.rs:3150:9
5: arrow_ipc::writer::tests::test_roundtrip_list_of_fixed_list
at ./src/writer.rs:3127:9
6: arrow_ipc::writer::tests::test_roundtrip_list_of_fixed_list::{{closure}}
at ./src/writer.rs:3083:47
7: core::ops::function::FnOnce::call_once
at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
8: core::ops::function::FnOnce::call_once
at /rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Thanks @alamb . I've added the requested changes. I appreciate the rapid response. |
Ah, I don't have merge approval for this repository since I'm not a committer on Arrow. |
let point = [Some(10.), None, None]; | ||
for p in point { | ||
match p { | ||
Some(p) => l2_builder.values().values().append_value(p), | ||
None => l2_builder.values().values().append_null(), | ||
} | ||
} | ||
|
||
l2_builder.values().append(true); | ||
l2_builder.append(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder why this point [Some(10.), None, None]
needs to be separately appended? Cannot it be in the above loop too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this entry a different row of the outer list.
} | ||
|
||
#[test] | ||
fn test_roundtrip_fixed_list() -> Result<(), ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add another test with null value too?
Thanks @timsaucer @alamb |
I would very much like to see this in a bugfix release 🙏 |
Thanks, i filed a ticket to track the idea. @timsaucer also was interested I think |
…nd non-zero start offset (apache#7318) * When serializing fixed length arrays, adjust the offsets for writing out * Add unit test * clippy warnings * Add unit test for nulls * Update unit test to account for which schema had nulls
…nd non-zero start offset (apache#7318) * When serializing fixed length arrays, adjust the offsets for writing out * Add unit test * clippy warnings * Add unit test for nulls * Update unit test to account for which schema had nulls
…nd non-zero start offset (apache#7318) * When serializing fixed length arrays, adjust the offsets for writing out * Add unit test * clippy warnings * Add unit test for nulls * Update unit test to account for which schema had nulls
* bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset (#7318) * When serializing fixed length arrays, adjust the offsets for writing out * Add unit test * clippy warnings * Add unit test for nulls * Update unit test to account for which schema had nulls * Add missing type annotation (#7326) * Update version * Create changelog --------- Co-authored-by: Tim Saucer <[email protected]>
* bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset (#7318) * When serializing fixed length arrays, adjust the offsets for writing out * Add unit test * clippy warnings * Add unit test for nulls * Update unit test to account for which schema had nulls * Add missing type annotation (#7326) * Update version * Create changelog --------- Co-authored-by: Matthijs Brobbel <[email protected]>
Which issue does this PR close?
Rationale for this change
This bug occurs when these conditions are all met:
What changes are included in this PR?
This PR checks for the case of a fixed sized list and adjusts the offsets to the child arrays during write.
Unit tests added.
Are there any user-facing changes?
None