Skip to content

rustdoc: Rearrange Item/ItemInner. #138927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 27, 2025

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Mar 25, 2025

The Item struct is 48 bytes and contains a Box<ItemInner>;
ItemInner is 104 bytes. This is an odd arrangement. Normally you'd
have one of the following.

  • A single large struct, which avoids the allocation for the Box, but
    can result in lots of wasted space in unused parts of a container like
    Vec<Item>, HashSet<Item>, etc.

  • Or, something like struct Item(Box<ItemInner>), which requires the
    Box allocation but gives a very small Item size, which is good for
    containers like Vec<Item>.

Item/ItemInner currently gets the worst of both worlds: it always
requires a Box, but Item is also pretty big and so wastes space in
containers. It would make sense to push it in one direction or the
other. #138916 showed that the first option is a regression for rustdoc,
so this commit does the second option, which improves speed and reduces
memory usage.

r? @GuillaumeGomez

@rustbot rustbot added A-rustdoc-json Area: Rustdoc JSON backend S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Mar 25, 2025
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 25, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 25, 2025
…, r=<try>

rustdoc: Rearrange `Item`/`ItemInner`.

The `Item` struct is 48 bytes and contains a `Box<ItemInner>`; `ItemInner` is 104 bytes. This is an odd arrangement. Normally you'd have one of the following.

- A single large struct, which avoids the allocation for the `Box`, but can result in lots of wasted space in unused parts of a container like `Vec<Item>`, `HashSet<Item>`, etc.

- Or, something like `struct Item(Box<ItemInner>)`, which requires the `Box` allocation but gives a very small Item size, which is good for containers like `Vec<Item>`. (`Vec<Box<Item>>` would also work.)

`Item`/`ItemInner` currently gets the worst of both worlds: it always requires a `Box`, but `Item` is also pretty big and so wastes space in containers. It would make sense to push it in one direction or the other. This commit does the second option, giving a tiny `Item`.

r? `@ghost`
@bors
Copy link
Collaborator

bors commented Mar 25, 2025

⌛ Trying commit 4312081 with merge ffdc630...

@bors
Copy link
Collaborator

bors commented Mar 25, 2025

☀️ Try build successful - checks-actions
Build commit: ffdc630 (ffdc630765aa08436ba74c3bd34350705ecf9b9b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ffdc630): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-0.3%, -0.2%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.3% [-0.3%, -0.2%] 2

Max RSS (memory usage)

Results (primary -1.4%, secondary -0.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.8% [0.8%, 0.8%] 1
Regressions ❌
(secondary)
2.7% [2.0%, 3.3%] 2
Improvements ✅
(primary)
-3.5% [-3.5%, -3.5%] 1
Improvements ✅
(secondary)
-2.8% [-3.0%, -2.7%] 3
All ❌✅ (primary) -1.4% [-3.5%, 0.8%] 2

Cycles

Results (secondary 1.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.7% [2.4%, 3.0%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.6% [-4.6%, -4.6%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 777.441s -> 777.041s (-0.05%)
Artifact size: 365.81 MiB -> 365.79 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 25, 2025
The `Item` struct is 48 bytes and contains a `Box<ItemInner>`;
`ItemInner` is 104 bytes. This is an odd arrangement. Normally you'd
have one of the following.

- A single large struct, which avoids the allocation for the `Box`, but
  can result in lots of wasted space in unused parts of a container like
  `Vec<Item>`, `HashSet<Item>`, etc.

- Or, something like `struct Item(Box<ItemInner>)`, which requires the
  `Box` allocation but gives a very small Item size, which is good for
  containers like `Vec<Item>`.

`Item`/`ItemInner` currently gets the worst of both worlds: it always
requires a `Box`, but `Item` is also pretty big and so wastes space in
containers. It would make sense to push it in one direction or the
other. rust-lang#138916 showed that the first option is a regression for rustdoc,
so this commit does the second option, which improves speed and reduces
memory usage.
@nnethercote
Copy link
Contributor Author

nnethercote commented Mar 25, 2025

Good doc perf results: icounts don't change much, but cycles, wall-time and max-rss all show improvements. (This is a case where enabling "Show non-relevant results" is worthwhile.)

I also tried merging Item and ItemInner and just using Vec<Box<Item>> everywhere, but that required many more changes and I gave up before I finished.

@nnethercote nnethercote marked this pull request as ready for review March 25, 2025 20:00
@GuillaumeGomez
Copy link
Member

Great catch! Funny that no one noticed it. All good for me, thanks!

@bors r+ rollup=iffy

@bors
Copy link
Collaborator

bors commented Mar 27, 2025

📌 Commit ffee55c has been approved by GuillaumeGomez

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 27, 2025
@bors
Copy link
Collaborator

bors commented Mar 27, 2025

⌛ Testing commit ffee55c with merge 217693a...

@bors
Copy link
Collaborator

bors commented Mar 27, 2025

☀️ Test successful - checks-actions
Approved by: GuillaumeGomez
Pushing 217693a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 27, 2025
@bors bors merged commit 217693a into rust-lang:master Mar 27, 2025
7 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Mar 27, 2025
Copy link

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing ecb170a (parent) -> 217693a (this PR)

Test differences

Show 32935 test diffs

Stage 1

  • boxed::uninitialized_zero_size_box: pass -> [missing] (J0)
  • c_str2::into_rc: pass -> [missing] (J0)
  • collections::btree::set::tests::test_extract_if_drop_panic_leak: pass -> [missing] (J0)
  • collections::linked_list::tests::test_append: pass -> [missing] (J0)
  • io::buffered::tests::test_buffered_writer: pass -> [missing] (J0)
  • iter::adapters::flatten::test_iterator_flatten: pass -> [missing] (J0)
  • iter::range::test_range: pass -> [missing] (J0)
  • mpmc::port_gone_concurrent: pass -> [missing] (J0)
  • mpsc::port_gone_concurrent: pass -> [missing] (J0)
  • mpsc_sync::drop_full: pass -> [missing] (J0)
  • nonzero::test_size_nonzero_in_option: pass -> [missing] (J0)
  • num::i32::test_div_ceil: pass -> [missing] (J0)
  • num::i8::test_count_ones: pass -> [missing] (J0)
  • num::i8::test_midpoint: pass -> [missing] (J0)
  • num::test_try_u16u32: pass -> [missing] (J0)
  • slice::memchr::matches_begin: pass -> [missing] (J0)
  • slice::memchr::matches_nul_reversed: pass -> [missing] (J0)
  • slice::test_binary_search: pass -> [missing] (J0)
  • slice::test_chunks_next: pass -> [missing] (J0)
  • slice::test_mut_rchunks: pass -> [missing] (J0)
  • str::test_rsplit_once: pass -> [missing] (J0)
  • str::test_str_default: pass -> [missing] (J0)
  • test_checked_sub: pass -> [missing] (J0)
  • test_next_power_of_two_u64: pass -> [missing] (J0)
  • vec::test_slice_out_of_bounds_1: pass -> [missing] (J0)
  • doctest::tests::make_test_no_crate_inject: pass -> [missing] (J1)
  • errors::verify_ast_lowering_await_only_in_async_fn_and_blocks_4: pass -> [missing] (J1)
  • errors::verify_ast_passes_pattern_in_fn_pointer_32: pass -> [missing] (J1)
  • errors::verify_const_eval_mutable_ptr_in_final_2: pass -> [missing] (J1)
  • errors::verify_hir_typeck_yield_expr_outside_of_coroutine_3: pass -> [missing] (J1)
  • errors::verify_passes_inner_crate_level_attr_5: pass -> [missing] (J1)
  • errors::verify_passes_rustc_layout_scalar_valid_range_not_struct_63: pass -> [missing] (J1)
  • html::length_limit::tests::close_too_many: pass -> [missing] (J1)
  • spec::tests::i686_unknown_uefi: pass -> [missing] (J1)
  • spec::tests::sparc64_unknown_linux_gnu: pass -> [missing] (J1)
  • sys::process::unix::common::tests::test_process_mask: ignore -> [missing] (J1)
  • theme::tests::check_invalid_css: pass -> [missing] (J1)
  • btree::set::clone_100_and_pop_all: pass -> [missing] (J2)
  • btree::set::difference_random_100_vs_10k: pass -> [missing] (J2)
  • fs::tests::copy_file_follows_dst_symlink: pass -> [missing] (J2)
  • iter::bench_peekable_ref_sum: pass -> [missing] (J2)
  • slice::fill_byte_sized: pass -> [missing] (J2)
  • sort::tests::stable::correct_i32_narrow: pass -> [missing] (J2)
  • sort::tests::unstable::self_cmp_string_random: pass -> [missing] (J2)
  • sort::tests::unstable::stability_cell_i32_random: pass -> [missing] (J2)
  • sort::tests::unstable::violate_ord_retain_orig_set_string_saw_mixed: pass -> [missing] (J2)
  • str::trim_start_ascii_char::long_lorem_ipsum: pass -> [missing] (J2)
  • vec::bench_clone_from_10_0010_0000: pass -> [missing] (J2)
  • f16::test_round: pass -> [missing] (J3)

Stage 2

  • any::any_referenced: [missing] -> pass (J0)
  • collections::btree::map::tests::test_range_panic_1: [missing] -> pass (J0)
  • collections::btree::set::tests::test_difference_size_hint: [missing] -> pass (J0)
  • collections::btree::set::tests::test_zip: [missing] -> pass (J0)
  • f128::test_float_bits_conv: [missing] -> pass (J0)
  • f128::test_neg_zero: [missing] -> pass (J0)
  • io::error::tests::test_errorkind_packing: [missing] -> pass (J0)
  • io::tests::seek_position: [missing] -> pass (J0)
  • io::tests::test_write_all_vectored: [missing] -> pass (J0)
  • iter::adapters::zip::test_zip_next_back_side_effects_exhausted: [missing] -> pass (J0)
  • mem::uninit_write_clone_of_slice: [missing] -> pass (J0)
  • mutex::panic_while_mapping_unlocked_poison: [missing] -> pass (J0)
  • net::ip_addr::ipv6_addr_to_string: [missing] -> pass (J0)
  • num::flt2dec::strategy::grisu::shortest_sanity_test: [missing] -> pass (J0)
  • num::u64::test_isqrt: [missing] -> pass (J0)
  • num::u8::test_swap_bytes: [missing] -> pass (J0)
  • result::test_expect_err_err: [missing] -> pass (J0)
  • slice::test_chunks_exact_mut_nth: [missing] -> pass (J0)
  • slice::test_find_rfind: [missing] -> pass (J0)
  • sort::tests::stable::self_cmp_i32_random_z1: [missing] -> pass (J0)
  • str::slice_index::rangeinclusive_len_len::index_mut_fail: [missing] -> pass (J0)
  • clean::cfg::tests::test_parse_err: [missing] -> pass (J1)
  • clean::utils::tests::int_format_decimal: [missing] -> pass (J1)
  • edit_distance::tests::test_method_name_similarity_score: [missing] -> pass (J1)
  • error_reporting::traits::on_unimplemented::verify_trait_selection_disallowed_positional_argument_4: [missing] -> pass (J1)
  • errors::verify_const_eval_unallowed_heap_allocations_21: [missing] -> pass (J1)
  • errors::verify_monomorphize_abi_required_target_feature_10: [missing] -> pass (J1)
  • errors::verify_parse_multiple_where_clauses_88: [missing] -> pass (J1)
  • errors::verify_parse_path_double_colon_57: [missing] -> pass (J1)
  • errors::verify_parse_pattern_method_param_without_body_48: [missing] -> pass (J1)
  • errors::verify_parse_use_empty_block_not_semi_31: [missing] -> pass (J1)
  • errors::verify_passes_doc_test_unknown_spotlight_45: [missing] -> pass (J1)
  • errors::verify_passes_rustc_dirty_clean_69: [missing] -> pass (J1)
  • errors::verify_session_sanitizer_cfi_generalize_pointers_requires_cfi_13: [missing] -> pass (J1)
  • graph::scc::tests::diamond: [missing] -> pass (J1)
  • leb128::tests::test_i64_leb128: [missing] -> pass (J1)
  • lints::verify_lint_builtin_keyword_idents_35: [missing] -> pass (J1)
  • parser::tests::parse_exprs: [missing] -> pass (J1)
  • spec::tests::thumbv7a_pc_windows_msvc: [missing] -> pass (J1)
  • test_int_ranges: [missing] -> pass (J1)
  • tests::test_typed_arena_drop_on_clear: [missing] -> pass (J1)
  • fs::tests::file_test_io_seek_and_write: [missing] -> pass (J2)
  • fs::tests::file_test_io_smoke_test: [missing] -> pass (J2)
  • hash::set_ops::set_symmetric_difference: [missing] -> pass (J2)
  • num::int_log::u64_log10_predictable: [missing] -> pass (J2)
  • slice::sort_unstable_medium_random: [missing] -> pass (J2)
  • sort::tests::stable::panic_observable_is_less_descending: [missing] -> pass (J2)
  • sort::tests::stable::panic_retain_orig_set_cell_i32_random_d2: [missing] -> pass (J2)
  • sort::tests::unstable::panic_retain_orig_set_cell_i32_random_d20: [missing] -> pass (J2)
  • vec::bench_extend_recycle: [missing] -> pass (J2)
  • sort::tests::stable::panic_retain_orig_set_cell_i32_random: [missing] -> ignore (J4)

(and 16364 additional test diffs)

Additionally, 16471 doctest diffs were found. These are ignored, as they are noisy.

Job group index

  • J0: aarch64-apple, test-various, x86_64-apple-1, x86_64-gnu-aux
  • J1: aarch64-apple, x86_64-apple-1
  • J2: aarch64-apple, test-various, x86_64-apple-1
  • J3: aarch64-apple
  • J4: x86_64-gnu-aux

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (217693a): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-0.3%, -0.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.3% [-0.3%, -0.3%] 2

Max RSS (memory usage)

Results (primary -0.7%, secondary 1.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.5% [1.5%, 1.5%] 1
Regressions ❌
(secondary)
1.9% [1.9%, 1.9%] 1
Improvements ✅
(primary)
-1.8% [-2.2%, -1.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.7% [-2.2%, 1.5%] 3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 779.46s -> 778.49s (-0.12%)
Artifact size: 365.95 MiB -> 365.89 MiB (-0.01%)

@nnethercote nnethercote deleted the rearrange-Item-ItemInner branch March 27, 2025 21:19
@nnethercote
Copy link
Contributor Author

The post-merge perf CI results (slight perf wins overall) were less good than the pre-merge results (clear perf wins overall), for no obvious reason. (Only the doc results are relevant for this PR.) But even if perf was neutral, this Item/ItemInner arrangement makes more sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-json Area: Rustdoc JSON backend merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants