Skip to content

refactor!: do not default the struct array length to 0 in Struct::try_new #7247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

westonpace
Copy link
Member

@westonpace westonpace commented Mar 6, 2025

Which issue does this PR close?

Rationale for this change

See PR

What changes are included in this PR?

  • StructArray::try_new will now return an error if there are no arrays provided
  • StructArray::new will panic if there are no arrays provided
  • StructArray::from(vec![]) will panic

Are there any user-facing changes?

BREAKING CHANGE: StructArray::try_new will now return an error if no child arrays are provided.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 6, 2025
@tustvold
Copy link
Contributor

tustvold commented Mar 6, 2025

I think as this is a breaking change regardless, we should probably just add the length as an explicit argument and avoid potential errors/panics. I don't really feel strongly though, I personally think empty StructArrays are something the arrow spec probably shouldn't permit, but that ship has sailed...

Perhaps @alamb has thoughts on the matter

@westonpace
Copy link
Member Author

Ah, as you were typing that I saw the suggestions from #6732 which suggested adding try_new_with_length (yet another alternative). I've implemented that and changed the docs for try_new to encourage try_new_with_length but I agree it might be more straightforward to force users to do the inference themselves.

@tustvold
Copy link
Contributor

tustvold commented Mar 6, 2025

which suggested adding try_new_with_length (yet another alternative)

I'm personally less a fan of this, as it isn't materially different from the current state of play - where there are multiple alternative methods.

I guess I was thinking something along the lines of

pub fn try_new(
        len: usize,
        fields: Fields,
        arrays: Vec<ArrayRef>,
        nulls: Option<NullBuffer>,
    ) -> Result<Self, ArrowError>

That being said, this would be potentially quite a disruptive change, and I am not sure how common empty StructArray actually are, I can't help feeling most of the time they'd arise from a deficiency in a projection system, rather than someone actually creating them...

For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in.

@gatesn
Copy link

gatesn commented Apr 28, 2025

I would push for the breaking change honestly..

Here's another example: #7447

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes sense to me and i think is consistent with RecordBatch -- thank you @westonpace and @tustvold and @gatesn

For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in.

I think RecordBatch::try_new errors if there are no columns:

let row_count = options
.row_count
.or_else(|| columns.first().map(|col| col.len()))
.ok_or_else(|| {
ArrowError::InvalidArgumentError(
"must either specify a row count or at least one column".to_string(),
)
})?;

I would push for the breaking change honestly..

One thing we could do is deprecate StructArray::try_new which would give a bunch of warnings and encourage people to upgrade, but that would be inconsistent with RecordBatch::try_new() / RecordBatch::try_new_with_options)

@westonpace westonpace force-pushed the refactor/do-not-default-struct-arr-len branch from 5a5f41d to 907c293 Compare April 30, 2025 13:55
@westonpace
Copy link
Member Author

I've rebased. It seems the consensus is leaning towards making this change now. If there are no other suggestions by Friday I will merge.

@westonpace westonpace merged commit 4491b17 into apache:main May 2, 2025
26 checks passed
@alamb
Copy link
Contributor

alamb commented May 3, 2025

🚀 -- thanks @westonpace and everyone else

@alamb alamb added the api-change Changes to the arrow API label May 3, 2025
@alamb
Copy link
Contributor

alamb commented May 3, 2025

I am adding an API change label to this PR to make sure it is highlighted in the release notes - I think it is fine to release this in a minor release (and not have to wait for the next major release)

@kylebarron
Copy link
Contributor

IMO I'd have considered this to be a breaking change, as the StructArray::new constructor now panics with input considered valid as of 55.0.

In pyo3-arrow I have a test that checks I can import an empty RecordBatch which goes through this code path, which uses StructArray::new and now panics.

Ideally I'd like to declare my dependency at 55.0 to let downstream users use either 55.0 or 55.1, but I can't use the suggested workarounds of try_new_with_length because that didn't exist in 55.

I figure I'll just bump my requirement to 55.1 and that'll be the least painful way to handle this.

@alamb
Copy link
Contributor

alamb commented May 19, 2025

We can also create a 55.0.1 release and backport the try_new_with_length method if that is better

@kylebarron
Copy link
Contributor

Given how much work it looks like every time you make a release, I really don't want to make you make a release if you don't need to 😅

In kylebarron/arro3#328 I bumped to require 55.1 and that's fine with me.

@alamb
Copy link
Contributor

alamb commented May 21, 2025

This also seems to have caused a panic when filtering struct arrays:

thorfour pushed a commit to polarsignals/arrow-rs that referenced this pull request May 27, 2025
…_new (apache#7247)

* Do not default the struct array length to 0 in Struct::try_new if there are no child arrays.

* Extend testing for new_empty_fields

* Add try_new_with_length
Blizzara added a commit to Blizzara/datafusion that referenced this pull request May 28, 2025
xudong963 pushed a commit to apache/datafusion that referenced this pull request May 29, 2025
* Fix ScalarStructBuilder::build() when the struct is empty

This had been broken by apache/arrow-rs#7247 in Arrow 55.1.0

* fix test for error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StructArray::try_new behavior can be unexpected when there are no child arrays
5 participants