-
Notifications
You must be signed in to change notification settings - Fork 111
fix(testsuite):fix flaky by ensure port is free to use #546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: arshia-rgh <[email protected]>
7a3f426 to
d887e14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes flaky tests in the port testsuite by implementing dynamic port allocation instead of using hardcoded port calculations that could result in port conflicts. The changes ensure that only free ports are used during testing by allocating them dynamically and adding retry logic when port allocation fails.
Key changes:
- Replaced hardcoded port calculation logic (
(childPID + port) % 60000) with dynamic port allocation using the OS - Added retry mechanism (up to 10 attempts) in
testProtoRoutineto handle transient port allocation failures - Implemented helper functions
allocateAvailablePortandisAddrInUseto support the new allocation strategy
Comments suppressed due to low confidence (3)
pkg/port/testsuite/testsuite.go:33
- The variables
randSrcandrandMuare declared but never used in the code. These should be removed to avoid confusion and clean up the codebase.
switch mode := os.Getenv(reexecKeyMode); mode {
case "":
os.Exit(m.Run())
case "child":
pkg/port/testsuite/testsuite.go:371
- The
isAddrInUsefunction will panic iferris nil when callingerr.Error()on line 368. Add a nil check before callingerr.Error():
func isAddrInUse(err error) bool {
if err == nil {
return false
}
if errors.Is(err, syscall.EADDRINUSE) {
return true
}
msg := err.Error()
return strings.Contains(msg, "address already in use") ||
strings.Contains(msg, "port is busy")
} strings.Contains(msg, "port is busy")
}
pkg/port/testsuite/testsuite.go:259
- This deferred
RemovePortcall will conflict with the existing directRemovePortcall later in the function (around line 310 in the full file). The port will be removed twice, causing the second call to fail. Either remove the existing direct call or remove this defer to avoid duplicate removal.
t.Logf("opened port: %+v", portStatus)
if proto == "udp" || proto == "udp4" {
// Dial does not return an error for UDP even if the port is not exposed yet
time.Sleep(1 * time.Second)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return 0, err | ||
| } | ||
| defer ln.Close() | ||
| return ln.Addr().(*net.TCPAddr).Port, nil | ||
| case "udp", "udp4": | ||
| addr, err := net.ResolveUDPAddr(proto, loopback) | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
| conn, err := net.ListenUDP(proto, addr) | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
| defer conn.Close() | ||
| return conn.LocalAddr().(*net.UDPAddr).Port, nil | ||
| default: | ||
| return 0, fmt.Errorf("unsupported proto %q", proto) | ||
| } | ||
| } | ||
|
|
||
| func isAddrInUse(err error) bool { | ||
| if errors.Is(err, syscall.EADDRINUSE) { | ||
| return true | ||
| } | ||
| msg := err.Error() |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocateAvailablePort function has a time-of-check-to-time-of-use (TOCTOU) race condition. The listener/connection is closed immediately after getting the port number, which means another process can claim that port before the caller uses it. This can cause the very flakiness this PR aims to fix.
While the retry logic in testProtoRoutine mitigates this, the race window still exists. Consider keeping the listener/connection open and returning it along with the port number, or accept that the retry mechanism is the intended solution and document this behavior.
No description provided.