Inline All The Things #8946

qwerasd205 · 2025-09-29T03:47:09Z

I used the new CPU counter mode in Instruments.app to track down functions that had instruction delivery bottlenecks (indicating i-cache misses) and picked a bunch of trivial functions to mark as inline (plus a couple that are only used once or twice and which benefit from inlining).

The size of macos-arm64/libghostty-fat.a built with zig build -Doptimize=ReleaseFast -Dxcframework-target=native goes from 145,538,856 bytes on main to 145,595,952 on this branch, a negligible increase.

These changes resulted in some pretty sizable improvements in vtebench results on my machine (Apple M3 Max):

With this, the only vtebench test we're slower than Alacritty in (on my machine, at 130x51 window size) is dense_cells (which, IMO, is so artificial that optimizing for it might actually negatively impact real world performance).

I also did a pretty simple improvement to how we copy the screen in the renderer, gave it its own page pool for less memory churn. Further optimization in that area should be explored since in some scenarios it seems like as much as 35% of the time on the io-reader thread is spent waiting for the lock.

Note

Before this is merged, someone really ought to test this on an x86 processor to see how the performance compares there, since this is tuning for my processor specifically, and I know that M chips have pretty big i-cache compared to some x86 processors which could impact the performance characteristics of these changes.

Changes it so that the renderer retains its own MemoryPool for PageList pages so that new pages rarely need to be allocated when cloning the screen. Also switches to using an arena allocator in `updateFrame` to avoid having to deinit the cloned screen since instead we can just throw out the memory.

Supported by benchmarks (vtebench on Apple M3 Max)

A whole bunch of inline annotations, some of these were tracked down with Instruments.app, others are guesses / just seemed right because they were trivial wrapper functions. Regardless, these changes are ultimately supported by improved vtebench results on my machine (Apple M3 Max).

qwerasd205 · 2025-09-29T03:51:51Z

Oh, test failures. I totally neglected to test before PRing. I'll look in to that tomorrow.

Edit: They're very confusing errors actually-- is it some sort of miscompile? It's seemingly caused by Parser.doAction being inline and then Parser.next being called with a comptime-known 0x1B...

Very weird failures, not 100% sure of the cause; regardless, this fixes them.

mitchellh · 2025-09-29T17:57:30Z

I can run the vtebench marks at home when I have my x86 machine.

qwerasd205 added 4 commits September 28, 2025 21:21

datastruct: make trivial linked list ops inline

2c0259c

Supported by benchmarks (vtebench on Apple M3 Max)

termio: make trivial stream handler callbacks inline

817d0a7

Supported by benchmarks (vtebench on Apple M3 Max)

qwerasd205 requested review from a team as code owners September 29, 2025 03:47

fix test failures

a06d32b

Very weird failures, not 100% sure of the cause; regardless, this fixes them.

qwerasd205 force-pushed the inline-all-the-things branch from 7f6737b to a06d32b Compare September 29, 2025 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inline All The Things #8946

Inline All The Things #8946

qwerasd205 commented Sep 29, 2025

Uh oh!

qwerasd205 commented Sep 29, 2025 •

edited

Loading

Uh oh!

mitchellh commented Sep 29, 2025

Uh oh!

Uh oh!

Inline All The Things #8946

Are you sure you want to change the base?

Inline All The Things #8946

Conversation

qwerasd205 commented Sep 29, 2025

Uh oh!

qwerasd205 commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mitchellh commented Sep 29, 2025

Uh oh!

Uh oh!

qwerasd205 commented Sep 29, 2025 •

edited

Loading