-
Notifications
You must be signed in to change notification settings - Fork 965
refactor!: do not default the struct array length to 0 in Struct::try_new #7247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: do not default the struct array length to 0 in Struct::try_new #7247
Conversation
I think as this is a breaking change regardless, we should probably just add the length as an explicit argument and avoid potential errors/panics. I don't really feel strongly though, I personally think empty StructArrays are something the arrow spec probably shouldn't permit, but that ship has sailed... Perhaps @alamb has thoughts on the matter |
Ah, as you were typing that I saw the suggestions from #6732 which suggested adding |
I'm personally less a fan of this, as it isn't materially different from the current state of play - where there are multiple alternative methods. I guess I was thinking something along the lines of
That being said, this would be potentially quite a disruptive change, and I am not sure how common empty StructArray actually are, I can't help feeling most of the time they'd arise from a deficiency in a projection system, rather than someone actually creating them... For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in. |
I would push for the breaking change honestly.. Here's another example: #7447 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR makes sense to me and i think is consistent with RecordBatch -- thank you @westonpace and @tustvold and @gatesn
For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in.
I think RecordBatch::try_new errors if there are no columns:
arrow-rs/arrow-array/src/record_batch.rs
Lines 314 to 321 in 7f3907e
let row_count = options | |
.row_count | |
.or_else(|| columns.first().map(|col| col.len())) | |
.ok_or_else(|| { | |
ArrowError::InvalidArgumentError( | |
"must either specify a row count or at least one column".to_string(), | |
) | |
})?; |
I would push for the breaking change honestly..
One thing we could do is deprecate StructArray::try_new
which would give a bunch of warnings and encourage people to upgrade, but that would be inconsistent with RecordBatch::try_new() / RecordBatch::try_new_with_options
)
5a5f41d
to
907c293
Compare
I've rebased. It seems the consensus is leaning towards making this change now. If there are no other suggestions by Friday I will merge. |
🚀 -- thanks @westonpace and everyone else |
I am adding an API change label to this PR to make sure it is highlighted in the release notes - I think it is fine to release this in a minor release (and not have to wait for the next major release) |
IMO I'd have considered this to be a breaking change, as the In pyo3-arrow I have a test that checks I can import an empty Ideally I'd like to declare my dependency at I figure I'll just bump my requirement to |
We can also create a 55.0.1 release and backport the |
Given how much work it looks like every time you make a release, I really don't want to make you make a release if you don't need to 😅 In kylebarron/arro3#328 I bumped to require |
This also seems to have caused a panic when filtering struct arrays: |
…_new (apache#7247) * Do not default the struct array length to 0 in Struct::try_new if there are no child arrays. * Extend testing for new_empty_fields * Add try_new_with_length
This had been broken by apache/arrow-rs#7247 in Arrow 55.1.0
* Fix ScalarStructBuilder::build() when the struct is empty This had been broken by apache/arrow-rs#7247 in Arrow 55.1.0 * fix test for error
Which issue does this PR close?
Rationale for this change
See PR
What changes are included in this PR?
StructArray::try_new
will now return an error if there are no arrays providedStructArray::new
will panic if there are no arrays providedStructArray::from(vec![])
will panicAre there any user-facing changes?
BREAKING CHANGE: StructArray::try_new will now return an error if no child arrays are provided.