WebGPU-based renderer for the editor

We're finally starting to look at implementing a WebGPU-based rendering in monaco, similar to what xterm.js uses. This issue is used to track all the work which is expected to take several months.

Project: https://github.com/orgs/microsoft/projects/1367/views/1

## Related issues

Here are some historical links that might be useful:

- Zero latency typing issue which investigated typing latency in monaco. This lead to some improvements but eventually came to the conclusion that the biggest win would be doing rendering manually using our own shaders https://github.com/microsoft/vscode/issues/27378
- Old exploration issue https://github.com/microsoft/vscode/issues/162445
- Internal issue describing my 2022 prototype which is based on xterm.js' webgl renderer https://github.com/microsoft/vscode-internalbacklog/issues/3157
  - The branch https://github.com/microsoft/vscode/tree/tyriar/webgl_monaco
- Internal issue describing my 2024 prototype which is based on a game renderer I was experimenting with in my spare time  https://github.com/microsoft/vscode-internalbacklog/issues/4906
  - The branch https://github.com/microsoft/vscode/tree/tyriar/gpu_exploration

---

Below copied from https://github.com/microsoft/vscode-internalbacklog/issues/4906

## GPU-based rendering

branch: [tyriar/gpu_exploration](https://github.com/microsoft/vscode/tree/tyriar/gpu_exploration)

### How GPU rendering works

It works by assembling array buffers which represent commands to run on the GPU, these are filled on the CPU with information like the texture to use (chracter, fg, bg), location, offset, etc. xterm.js for example allocates a cols x rows array buffer that represents the viewport only and updates it on every frame where the viewport changes.

There are 2 types of shaders:

- Vertex shader - This is run for every vertex (4 vertices per quad) and is used to transform the vertices into screen space.
- Fragment shader - This is run for every pixel in the quad and is used to determine the color of the pixel.

### How the prototype works

The WebGPU prototype works by pre-allocating a buffer that represents up to 3000 lines in a file with a maximum column length of 200. The buffers\* are lazily filled in based on what's the viewport. Meaning once a line is loaded, it doesn't need to be modified again. I think it updates more aggressively currently than needed due to my lack of knowledge around finding dirty lines in Monaco.

```wgsl
@vertex fn vs(
	vert: Vertex,
	@builtin(instance_index) instanceIndex: u32,
	@builtin(vertex_index) vertexIndex : u32
) -> VSOutput {
	let dynamicUnitInfo = dynamicUnitInfoStructs[instanceIndex];
	let spriteInfo = spriteInfo[u32(dynamicUnitInfo.textureIndex)];

	var vsOut: VSOutput;
	// Multiple vert.position by 2,-2 to get it into clipspace which ranged from -1 to 1
	vsOut.position = vec4f(
		(((vert.position * vec2f(2, -2)) / uniforms.canvasDimensions)) * spriteInfo.size + dynamicUnitInfo.position + ((spriteInfo.origin * vec2f(2, -2)) / uniforms.canvasDimensions) + ((scrollOffset.offset * 2) / uniforms.canvasDimensions),
		0.0,
		1.0
	);

	// Textures are flipped from natural direction on the y-axis, so flip it back
	vsOut.texcoord = vert.position;
	vsOut.texcoord = (
		// Sprite offset (0-1)
		(spriteInfo.position / textureInfoUniform.spriteSheetSize) +
		// Sprite coordinate (0-1)
		(vsOut.texcoord * (spriteInfo.size / textureInfoUniform.spriteSheetSize))
	);

	return vsOut;
}
```

```wgsl
@fragment fn fs(vsOut: VSOutput) -> @location(0) vec4f {
	return textureSample(ourTexture, ourSampler, vsOut.texcoord);
}
```

### Texture atlas

Glyphs are rendered on the CPU using the browser's canvas 2d context to draw the characters into a texture atlas. The texture atlas can have multiple pages, this is an optimization problem as uploading images is _relative_ expensive. xterm.js creates multiple small texture atlas pages, allocates using a shelf allocator and eventually merged them into larger immutable pages as they're more expensive to upload.

Currently the prototype uses a single large texture atlas page, but it warms it up in idle callbacks for the current font and all theme token colors in the background (using the `TaskQueue` xterm.js util).

![image](https://github.com/user-attachments/assets/6e12c0d1-a7d0-4451-a2d6-6632a1f80027)


### Memory usage

```
text_data_buffer: [wgslX, wgslY, textureIndex, ...]

texture_atlas_buffer: [positionX, positionY, sizeX, sizeY, offsetX, offsetY, ...]

textureIndex in text_data_buffer maps to texture_atlas_buffer[textureIndex * 6]
```

In the above, each text_data_buffer cell is 12 bytes (3x 32-bit floats), so 3000x200 would be:

```
3000 * 200 * 12 = 7.2MB
```

This is pretty insignificant for a modern GPU.

\* Double buffering is used as the GPU locks array buffers until it's done with it.

### Scrolling

The prototype currently scrolls extremely smoothly as at most a viewport worth of data is filled but often no viewport data will change. Then we just need to update the scroll offset so the shadow knows which cells to render.

### Input

So far, the above is highly optimized for readonly scrolling. For input/file changes there are a few cases we need to target. We essentially want to get these updates to take as little CPU time as possible, even if that means leaving stale and no-longer referenced data in the fixed buffers.

**Adding new lines or deleting lines**

This could be supported by uploading a map whose job is to map line numbers with the index in the fixed buffer:

![image](https://github.com/user-attachments/assets/8b6badfd-f793-4a79-b12d-c0e6f94d823e)

That way we only need to update indexes, not the whole line data.

**Inserting characters**

Simple O(n) solution is to just update the entire line. We could do tricks to make this faster but it might not be worth the effort if line length is fixed.

### Fixed buffers and long lines

My plan for how the characters will be send to the GPU is to have 1 or more fixed width buffers (eg. 80, 200?) with maps that point to indexes dynamically as described in the input section and then another more dynamic buffer which supports lines of arbitrary length. This dynamic buffer will be a little less optimized as it's the edge case when coding. The fixed buffers could also be dynamically allocated based on the file to save some memory.

### Other things we could do

- Sub-pixel glyphs for smoother flow - eg. render characters at 4x the width and support offsetting the character every 0.25px.
- Proportional font support isn't in xterm.js but it's possible without too much effort, we will need to support this anyway if we want to render widths just like the DOM renderer. The main thing this requires is some way of getting the width of the glyphs and the offset of each character in a line. Again this is an optimization problem of getting and updating this width/offst data as fast as possible.
- I believe SPAA is possible to do on the GPU using grayscale textures.
- Custom glyphs are supported in the terminal which allows pixel perfect box drawing characters for example like `┌───┘`. Whether this looks good in monaco is up to the font settings. Letter spacing and line height will always mess with these
- Texture atlas glyphs could be first drawn to a very small page and then packed more efficiently into a longer-term page in an idle callback or worker.
- Texture atlas pages could be cached to disk
- Canvas sharing - To optimize notebooks in particular we could have a shared canvas for all editors and tell the renderer that it owns a certain bounding box

### Test results

These were done on terminalInstance.ts. Particularly slow frames of the test are showed.

The `tyriar/gpu_exploration` tests disabled all dom rendering (lines, sticky scroll, etc.) to get an idea of how fast things **could be** without needed to perform layouts on each frame. It's safe to assume that rendering other components would be less than or equal to the time of the most complex component (minimap is similar, but could potentially share data as well).

**Scroll to top command**

M2 Pro Macbook main

![image](https://github.com/user-attachments/assets/4135a433-e019-474e-96bc-69969bf46cf7)

M2 Pro Macbook tyriar/gpu_exploration (all dom rendering disabled)

![image](https://github.com/user-attachments/assets/72f531ab-aef8-453e-b4e0-e44b1f981c63)

Windows gaming PC main

![image](https://github.com/user-attachments/assets/f96b0915-fb39-4203-9e55-d45170d99316)

Windows gaming PC tyriar/gpu_exploration (all dom rendering disabled)

![image](https://github.com/user-attachments/assets/79abf70f-5265-480d-aa31-c2c46f12c11e)

**Scrolling with small text on a huge viewport**

fontSize 6, zoomLevel -4

M2 Pro Macbook main

![image](https://github.com/user-attachments/assets/94b30603-c59d-4381-9917-4af0f1518795)

M2 Pro Macbook tyriar/gpu_exploration (all dom rendering disabled)

![image](https://github.com/user-attachments/assets/187abe65-f6e9-487c-b36e-0ba220ae017b)

Windows gaming PC main

![image](https://github.com/user-attachments/assets/63c1a618-0ab3-4cca-8750-f2cf02a16173)

Windows gaming PC tyriar/gpu_exploration (all dom rendering disabled)

![image](https://github.com/user-attachments/assets/4bdc4f93-f018-4830-91e0-58bea6f6b383)

**Very long line**

Long lines aren't supported in the gpu renderer currently

#### Shaders run in parallel to microtasks and layout

The sample below from the Windows scroll to top test above demonstrates how the shaders execute in parallel with layout, as opposed to all after layout.

Before:

![image](https://github.com/user-attachments/assets/0ed78860-caf5-4b18-afbc-e9194a64bb60)

After:

![image](https://github.com/user-attachments/assets/a6750842-6ddc-4d99-9638-e58d5f0c632d)

---

Harfbuzz shaping engine is used by lots of programs including Chromium to determine various things about text rendering. This might be needed for good RTL/ligature/grapheme rendering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WebGPU-based renderer for the editor #221145

Related issues

GPU-based rendering

How GPU rendering works

How the prototype works

Texture atlas

Memory usage

Scrolling

Input

Fixed buffers and long lines

Other things we could do

Test results

Shaders run in parallel to microtasks and layout

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WebGPU-based renderer for the editor #221145

Description

Related issues

GPU-based rendering

How GPU rendering works

How the prototype works

Texture atlas

Memory usage

Scrolling

Input

Fixed buffers and long lines

Other things we could do

Test results

Shaders run in parallel to microtasks and layout

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions