fix: copy_io is a large future, this will result in a large number of memory copies #70

zijiren233 · 2025-11-03T14:18:06Z

I wrote a simple test, but I won't put it in the source code because the benchmark framework hasn't been introduced yet.

use std::time::Instant;

use tokio::io::{AsyncRead, AsyncReadExt, AsyncWrite, AsyncWriteExt};

const BUFFER_SIZE: usize = 16 * 1024;
const TEST_DATA_SIZE: usize = 10 * 1024 * 1024 * 1024; // 100 MB
const NUM_ITERATIONS: usize = 10000;
const CONCURRENT_CONNECTIONS: usize = 1_000_000;

/// Stack-based version (current implementation)
async fn copy_io_stack<A, B>(a: &mut A, b: &mut B) -> (usize, usize, Option<std::io::Error>)
where
	A: AsyncRead + AsyncWrite + Unpin + ?Sized,
	B: AsyncRead + AsyncWrite + Unpin + ?Sized,
{
	let mut a2b = [0u8; BUFFER_SIZE];
	let mut b2a = [0u8; BUFFER_SIZE];

	let mut a2b_num = 0;
	let mut b2a_num = 0;
	let mut last_err = None;

	loop {
		tokio::select! {
			a2b_res = a.read(&mut a2b) => match a2b_res {
				Ok(num) => {
					if num == 0 {
						break;
					}
					a2b_num += num;
					if let Err(err) = b.write_all(&a2b[..num]).await {
						last_err = Some(err);
						break;
					}
				},
				Err(err) => {
					last_err = Some(err);
					break;
				}
			},
			b2a_res = b.read(&mut b2a) => match b2a_res {
				Ok(num) => {
					if num == 0 {
						break;
					}
					b2a_num += num;
					if let Err(err) = a.write_all(&b2a[..num]).await {
						last_err = Some(err);
						break;
					}
				},
				Err(err) => {
					last_err = Some(err);
					break;
				},
			}
		}
	}

	(a2b_num, b2a_num, last_err)
}

/// Box-based version (heap allocation - 2 separate allocations)
async fn copy_io_box<A, B>(a: &mut A, b: &mut B) -> (usize, usize, Option<std::io::Error>)
where
	A: AsyncRead + AsyncWrite + Unpin + ?Sized,
	B: AsyncRead + AsyncWrite + Unpin + ?Sized,
{
	let mut a2b = vec![0u8; BUFFER_SIZE].into_boxed_slice();
	let mut b2a = vec![0u8; BUFFER_SIZE].into_boxed_slice();

	let mut a2b_num = 0;
	let mut b2a_num = 0;
	let mut last_err = None;

	loop {
		tokio::select! {
			a2b_res = a.read(&mut a2b) => match a2b_res {
				Ok(num) => {
					if num == 0 {
						break;
					}
					a2b_num += num;
					if let Err(err) = b.write_all(&a2b[..num]).await {
						last_err = Some(err);
						break;
					}
				},
				Err(err) => {
					last_err = Some(err);
					break;
				}
			},
			b2a_res = b.read(&mut b2a) => match b2a_res {
				Ok(num) => {
					if num == 0 {
						break;
					}
					b2a_num += num;
					if let Err(err) = a.write_all(&b2a[..num]).await {
						last_err = Some(err);
						break;
					}
				},
				Err(err) => {
					last_err = Some(err);
					break;
				},
			}
		}
	}

	(a2b_num, b2a_num, last_err)
}

// Concurrent benchmark functions
async fn spawn_copy_stack() {
	let (mut client, mut server) = tokio::io::duplex(8 * 1024);

	tokio::spawn(async move {
		let _ = client.write_all(&[0u8; 1024]).await;
		let _ = client.shutdown().await;
	});

	let mut dummy = tokio::io::empty();
	let _ = copy_io_stack(&mut server, &mut dummy).await;
}

async fn spawn_copy_box() {
	let (mut client, mut server) = tokio::io::duplex(8 * 1024);

	tokio::spawn(async move {
		let _ = client.write_all(&[0u8; 1024]).await;
		let _ = client.shutdown().await;
	});

	let mut dummy = tokio::io::empty();
	let _ = copy_io_box(&mut server, &mut dummy).await;
}

async fn benchmark_concurrent<F, Fut>(name: &str, spawn_fn: F) -> std::time::Duration
where
	F: Fn() -> Fut,
	Fut: std::future::Future<Output = ()> + Send + 'static,
{
	let start = Instant::now();

	let mut handles = Vec::with_capacity(CONCURRENT_CONNECTIONS);

	for _ in 0..CONCURRENT_CONNECTIONS {
		let fut = spawn_fn();
		handles.push(tokio::spawn(fut));
	}

	// Wait for all tasks to complete
	for handle in handles {
		let _ = handle.await;
	}

	let duration = start.elapsed();
	println!("  [{name}] {CONCURRENT_CONNECTIONS} concurrent connections completed in {duration:?}");
	duration
}

#[tokio::main]
async fn main() {
	println!("=== Copy IO Benchmark: Stack vs Box Allocation ===");
	println!("Buffer size: {} KB", BUFFER_SIZE / 1024);
	println!("Test data size: {} MB", TEST_DATA_SIZE / (1024 * 1024));
	println!("Iterations: {NUM_ITERATIONS}\n");

	// Run concurrent benchmarks
	println!("Running concurrent benchmarks...\n");

	let stack_concurrent = benchmark_concurrent("Stack", spawn_copy_stack).await;
	let box_concurrent = benchmark_concurrent("Box", spawn_copy_box).await;

	println!("\n=== Concurrent Results ===");
	println!("Stack:          {stack_concurrent:?}");
	println!("Box: {box_concurrent:?}");
}

result:

=== Copy IO Benchmark: Stack vs Box Allocation ===
Buffer size: 16 KB
Test data size: 10240 MB
Iterations: 10000

Running concurrent benchmarks...

  [Stack] 1000000 concurrent connections completed in 1.457541375s
  [Box] 1000000 concurrent connections completed in 818.383292ms

=== Concurrent Results ===
Stack:          1.457541375s
Box: 818.383292ms

… memory copies

zijiren233 · 2025-11-03T14:19:55Z

Actually, tokio::io::copy_bidirectional does something similar.

Copilot

Pull Request Overview

This PR optimizes the copy_io function by moving buffer allocations from the stack to the heap using boxed slices. The change addresses memory efficiency concerns when handling a large number of concurrent connections, reducing the future size and associated memory copying overhead.

Key Changes:

Modified buffer allocation strategy from stack arrays to heap-allocated boxed slices

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-03T15:44:46Z

tuic-server/src/io.rs

+	let mut a2b = vec![0u8; BUFFER_SIZE].into_boxed_slice();
+	let mut b2a = vec![0u8; BUFFER_SIZE].into_boxed_slice();


The allocation pattern vec![0u8; BUFFER_SIZE].into_boxed_slice() performs unnecessary zero-initialization. Consider using Box::new_uninit_slice(BUFFER_SIZE) followed by assume_init() after the first read, or allocate with Vec::with_capacity(BUFFER_SIZE) and manually set the length, to avoid the overhead of zeroing memory that will be immediately overwritten by the read operations.

How about the bytes crate?

I don't think it's necessary to use the bytes crate here, since the buffer is used in a very simple way.
What's more worth optimizing is what copilot mentioned there's no need to initialize the slice.

The result after using uninit is as follows:

[Stack] 1000000 concurrent connections completed in 1.846773625s [Box] 1000000 concurrent connections completed in 603.313875ms [Box uninit] 1000000 concurrent connections completed in 566.827834ms

Itsusinn · 2025-11-04T05:31:52Z

Could you add several simple unit tests to copy_io ?

fix: copy_io is a large future, this will result in a large number of…

4717bd7

… memory copies

Itsusinn requested a review from Copilot November 3, 2025 15:44

Copilot AI reviewed Nov 3, 2025

View reviewed changes

perf: use uninited slice

d6f75ef

test: add copy_io tests

f637d2d

Itsusinn approved these changes Nov 4, 2025

View reviewed changes

Itsusinn merged commit fc2abaa into Itsusinn:main Nov 4, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: copy_io is a large future, this will result in a large number of memory copies #70

fix: copy_io is a large future, this will result in a large number of memory copies #70

Uh oh!

zijiren233 commented Nov 3, 2025

Uh oh!

zijiren233 commented Nov 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Itsusinn Nov 3, 2025 •

edited

Loading

Uh oh!

zijiren233 Nov 4, 2025

Uh oh!

zijiren233 Nov 4, 2025

Uh oh!

Itsusinn commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let mut a2b = vec![0u8; BUFFER_SIZE].into_boxed_slice();
		let mut b2a = vec![0u8; BUFFER_SIZE].into_boxed_slice();

fix: copy_io is a large future, this will result in a large number of memory copies #70

fix: copy_io is a large future, this will result in a large number of memory copies #70

Uh oh!

Conversation

zijiren233 commented Nov 3, 2025

Uh oh!

zijiren233 commented Nov 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Itsusinn Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zijiren233 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

zijiren233 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Itsusinn commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Itsusinn Nov 3, 2025 •

edited

Loading