Skip to content

Conversation

@arshia-rgh
Copy link

No description provided.

Copilot AI review requested due to automatic review settings December 3, 2025 15:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes flaky tests in the port testsuite by implementing dynamic port allocation instead of using hardcoded port calculations that could result in port conflicts. The changes ensure that only free ports are used during testing by allocating them dynamically and adding retry logic when port allocation fails.

Key changes:

  • Replaced hardcoded port calculation logic ((childPID + port) % 60000) with dynamic port allocation using the OS
  • Added retry mechanism (up to 10 attempts) in testProtoRoutine to handle transient port allocation failures
  • Implemented helper functions allocateAvailablePort and isAddrInUse to support the new allocation strategy
Comments suppressed due to low confidence (3)

pkg/port/testsuite/testsuite.go:33

  • The variables randSrc and randMu are declared but never used in the code. These should be removed to avoid confusion and clean up the codebase.
	switch mode := os.Getenv(reexecKeyMode); mode {
	case "":
		os.Exit(m.Run())
	case "child":

pkg/port/testsuite/testsuite.go:371

  • The isAddrInUse function will panic if err is nil when calling err.Error() on line 368. Add a nil check before calling err.Error():
func isAddrInUse(err error) bool {
	if err == nil {
		return false
	}
	if errors.Is(err, syscall.EADDRINUSE) {
		return true
	}
	msg := err.Error()
	return strings.Contains(msg, "address already in use") ||
		strings.Contains(msg, "port is busy")
}
		strings.Contains(msg, "port is busy")
}

pkg/port/testsuite/testsuite.go:259

  • This deferred RemovePort call will conflict with the existing direct RemovePort call later in the function (around line 310 in the full file). The port will be removed twice, causing the second call to fail. Either remove the existing direct call or remove this defer to avoid duplicate removal.
	t.Logf("opened port: %+v", portStatus)
	if proto == "udp" || proto == "udp4" {
		// Dial does not return an error for UDP even if the port is not exposed yet
		time.Sleep(1 * time.Second)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +338 to +362
return 0, err
}
defer ln.Close()
return ln.Addr().(*net.TCPAddr).Port, nil
case "udp", "udp4":
addr, err := net.ResolveUDPAddr(proto, loopback)
if err != nil {
return 0, err
}
conn, err := net.ListenUDP(proto, addr)
if err != nil {
return 0, err
}
defer conn.Close()
return conn.LocalAddr().(*net.UDPAddr).Port, nil
default:
return 0, fmt.Errorf("unsupported proto %q", proto)
}
}

func isAddrInUse(err error) bool {
if errors.Is(err, syscall.EADDRINUSE) {
return true
}
msg := err.Error()
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocateAvailablePort function has a time-of-check-to-time-of-use (TOCTOU) race condition. The listener/connection is closed immediately after getting the port number, which means another process can claim that port before the caller uses it. This can cause the very flakiness this PR aims to fix.

While the retry logic in testProtoRoutine mitigates this, the race window still exists. Consider keeping the listener/connection open and returning it along with the port number, or accept that the retry mechanism is the intended solution and document this behavior.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant