Skip to content

Attempt to use the high part of the size_hint in collect (again) #137908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Mar 3, 2025

I last tried something like this almost 7 years ago; I wonder if it's more tolerable now...

@rustbot
Copy link
Collaborator

rustbot commented Mar 3, 2025

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 3, 2025
@scottmcm
Copy link
Member Author

scottmcm commented Mar 3, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 3, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 3, 2025
…r=<try>

Attempt to use the high part of the `size_hint` in `collect` (again)

I last tried something like this [almost 7 years ago](rust-lang#53086); I wonder if it's more tolerable now...
@bors
Copy link
Collaborator

bors commented Mar 3, 2025

⌛ Trying commit 76f763c with merge 272c07f...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Mar 3, 2025

☀️ Try build successful - checks-actions
Build commit: 272c07f (272c07f89984286198fbfbda53502c20946bf526)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (272c07f): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [0.5%, 0.5%] 1

Max RSS (memory usage)

Results (primary 3.4%, secondary -2.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.4% [2.3%, 5.4%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.0% [-2.6%, -1.4%] 2
All ❌✅ (primary) 3.4% [2.3%, 5.4%] 3

Cycles

Results (secondary -7.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-7.9% [-8.6%, -7.1%] 2
All ❌✅ (primary) - - 0

Binary size

Results (primary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.2%] 13
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.0%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-0.1%, 0.2%] 16

Bootstrap: 773.006s -> 772.399s (-0.08%)
Artifact size: 361.95 MiB -> 361.93 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 3, 2025
@scottmcm
Copy link
Member Author

scottmcm commented Mar 3, 2025

Interesting! Those perf results look entirely tolerable, way better than last time.

Weird CI failure, though...

EDIT: found it, #137919

@rust-log-analyzer

This comment has been minimized.

Comment on lines 25 to 28
let (low, high) = iterator.size_hint();
assert!(
high.is_none_or(|high| low <= high),
"size_hint ({low:?}, {high:?}) is malformed from iterator {} collecting into {}",
core::any::type_name::<I>(), core::any::type_name::<Self>(),
);

let Some(first) = iterator.next() else {
return Vec::new();
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next and then size_hint is better since some adapters can provide a better hint after the first step.

@scottmcm scottmcm closed this Mar 5, 2025
@scottmcm scottmcm reopened this Mar 5, 2025
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the another-size-hint-attempt branch from 804cf26 to 76f763c Compare March 7, 2025 04:47
jieyouxu added a commit to jieyouxu/rust that referenced this pull request Mar 16, 2025
…acrum

debug-assert that the size_hint is well-formed in `collect`

Closes rust-lang#137919

In the hopes of helping to catch any future accidentally-incorrect rustc or stdlib iterators (like the ones rust-lang#137908 accidentally found), this has `Iterator::collect` call `size_hint` and check its `low` doesn't exceed its `Some(high)`.

There's of course a bazillion more places this *could* be checked, but the hope is that this one is a good tradeoff of being likely to catch lots of things while having minimal maintenance cost (especially compared to putting it in *every* container's `from_iter`).
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 16, 2025
Rollup merge of rust-lang#138329 - scottmcm:assert-hint, r=Mark-Simulacrum

debug-assert that the size_hint is well-formed in `collect`

Closes rust-lang#137919

In the hopes of helping to catch any future accidentally-incorrect rustc or stdlib iterators (like the ones rust-lang#137908 accidentally found), this has `Iterator::collect` call `size_hint` and check its `low` doesn't exceed its `Some(high)`.

There's of course a bazillion more places this *could* be checked, but the hope is that this one is a good tradeoff of being likely to catch lots of things while having minimal maintenance cost (especially compared to putting it in *every* container's `from_iter`).
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this pull request Mar 19, 2025
…acrum

debug-assert that the size_hint is well-formed in `collect`

Closes rust-lang#137919

In the hopes of helping to catch any future accidentally-incorrect rustc or stdlib iterators (like the ones rust-lang#137908 accidentally found), this has `Iterator::collect` call `size_hint` and check its `low` doesn't exceed its `Some(high)`.

There's of course a bazillion more places this *could* be checked, but the hope is that this one is a good tradeoff of being likely to catch lots of things while having minimal maintenance cost (especially compared to putting it in *every* container's `from_iter`).
Comment on lines +25 to +26
let (low, high) = iterator.size_hint();
let Some(first) = iterator.next() else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other comment still applies that some size_hints may be better after the first next -- or do you disagree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been meaning to come back and give that a shot. While it's certainly true, I'm also skeptical how valuable it is, since the usual case is things like flat_map that almost never have a good hint anyway -- and when they do, like flattening an iterator over arrays, it doesn't need the first one. But can try it.

(It makes me tempted to have a next_with_suggested_reserve -> Option<(NonZero<usize>, Item)>, too, but that's a bigger conversation.)

&& let extra = high - low
&& extra < low
{
high
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cap this at isize::MAX bytes? It's conceivable on smaller targets that some iterator with a low near the edge isn't going to produce any more than that in practice, even if high would be too much, so maybe it's a bad idea to force a capacity error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it would only ever matter for things that produce exactly low things. If it's just "near" the edge, it already panics today because it pre-allocates for low, then the low+1-th element tries to double it, and panics.

Said otherwise, this only uses the high as the hint if pushing could have tried to reserve that much anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I get that, but I still wonder if such "exactly low" cases exist that we would be harming.

I feel like the doubling logic should cap itself too, but that's a separate conversation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we only try the high capacity first when it's in (low, low*2], or else fall back to just low.

@scottmcm scottmcm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 9, 2025
@cuviper cuviper removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants