Skip to content

ci: fetch base branch in regression test check#916

Closed
qbit-glitch wants to merge 2 commits intonearai:mainfrom
qbit-glitch:fix/869-lock-across-await-v2
Closed

ci: fetch base branch in regression test check#916
qbit-glitch wants to merge 2 commits intonearai:mainfrom
qbit-glitch:fix/869-lock-across-await-v2

Conversation

@qbit-glitch
Copy link
Copy Markdown
Contributor

Summary

  • The regression-test-check workflow was failing with a false positive because origin/main wasn't available as a git ref in the CI environment
  • actions/checkout@v4 fetches the PR merge ref history but doesn't make the base branch ref available for three-dot diff comparisons
  • Added git fetch origin <base-ref> step before the check runs

Test plan

🤖 Generated with Claude Code

qbit-glitch and others added 2 commits March 11, 2026 06:46
Clone `mpsc::Sender` out of `RwLock` before `.send().await` to prevent
read guards from blocking write lock acquisition (shutdown/start) when
the channel buffer is full.

Fixed call sites:
- src/channels/http.rs: process_message()
- src/channels/web/server.rs: chat_send_handler(), chat_approval_handler()
- src/channels/web/handlers/chat.rs: chat_send_handler(), chat_approval_handler()
- src/channels/web/ws.rs: handle_client_message() (2 sites)
- src/channels/wasm/wrapper.rs: process_emitted_messages() (2 impls, also
  scoped rate_limiter write lock per-iteration)

Includes regression test: shutdown_completes_while_process_message_blocked

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The regression-test-check workflow failed because origin/main wasn't
available as a ref in the CI environment. actions/checkout@v4 fetches
the PR merge ref history but doesn't make the base branch ref available
for three-dot diff comparisons.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 11, 2026 03:30
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added scope: channel/web Web gateway channel scope: channel/wasm WASM channel runtime scope: ci CI/CD workflows size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: regular 2-5 merged PRs labels Mar 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve CI enforcement for regression tests by ensuring the base branch ref is available for three-dot diffs, and it also includes several channel runtime changes to avoid holding RwLock guards across .send().await (plus a new regression test for the deadlock scenario).

Changes:

  • Update regression-test-check workflow to fetch the PR base branch before running diff-based checks.
  • Refactor multiple channel message-send paths to clone mpsc::Sender outside RwLock read guards (avoid lock-across-await).
  • Add a regression test in the HTTP channel to ensure shutdown() completes even when a send is blocked on a full channel buffer.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
.github/workflows/regression-test-check.yml Fetches base branch so ${BASE_REF}...HEAD diffs work in CI.
src/channels/web/ws.rs Clones sender before awaiting .send() in WS handlers.
src/channels/web/server.rs Clones sender before awaiting .send() in chat HTTP handlers.
src/channels/web/handlers/chat.rs Same sender-clone pattern applied to (currently unused) chat handlers.
src/channels/wasm/wrapper.rs Clones sender and scopes rate limiter write locks to avoid holding locks across awaits.
src/channels/http.rs Clones sender before .send().await and adds a regression test for shutdown/send deadlock.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fetch-depth: 0

- name: Fetch base branch
run: git fetch origin ${{ github.event.pull_request.base.ref }}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added fetch step may still not create the origin/<base> remote-tracking ref that the script relies on (BASE_REF="origin/..."). To make this robust regardless of actions/checkout's configured refspec, fetch with an explicit destination (e.g., refs/heads/<base>:refs/remotes/origin/<base>) and quote the ref to avoid shell word-splitting for unusual branch names.

Suggested change
run: git fetch origin ${{ github.event.pull_request.base.ref }}
run: git fetch origin "refs/heads/${{ github.event.pull_request.base.ref }}:refs/remotes/origin/${{ github.event.pull_request.base.ref }}"

Copilot uses AI. Check for mistakes.
Comment thread src/channels/http.rs
Comment on lines 356 to +392
let response_rx = if wait_for_response {
if state.pending_responses.read().await.len() >= MAX_PENDING_RESPONSES {
return (
StatusCode::TOO_MANY_REQUESTS,
Json(WebhookResponse {
message_id: msg_id,
status: "error".to_string(),
response: Some("Too many pending requests".to_string()),
}),
);
}

let (tx, rx) = oneshot::channel();
state.pending_responses.write().await.insert(msg_id, tx);
Some(rx)
} else {
None
};

// Send message to the channel
let tx_guard = state.tx.read().await;
if let Some(tx) = tx_guard.as_ref() {
if tx.send(msg).await.is_err() {
return (
StatusCode::INTERNAL_SERVER_ERROR,
Json(WebhookResponse {
message_id: msg_id,
status: "error".to_string(),
response: Some("Channel closed".to_string()),
}),
);
// Send message to the channel — clone sender to avoid holding lock across await.
// Note: after cloning, shutdown() can run and clear the Option while we still hold
// a valid Sender clone. This is safe: the send will succeed (receiver still alive
// in the agent loop) or fail cleanly if the receiver was dropped.
let tx = {
let tx_guard = state.tx.read().await;
match tx_guard.as_ref() {
Some(tx) => tx.clone(),
None => {
return (
StatusCode::SERVICE_UNAVAILABLE,
Json(WebhookResponse {
message_id: msg_id,
status: "error".to_string(),
response: Some("Channel not started".to_string()),
}),
);
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When wait_for_response is true, the pending_responses entry is inserted before this early return. If the channel isn't started, this returns 503 without removing the entry, which can leak entries and eventually trigger false "Too many pending requests" responses. Consider only inserting after confirming state.tx exists, or ensure the entry is removed on all early-return paths (RAII guard / finally-style cleanup).

Copilot uses AI. Check for mistakes.
Comment thread src/channels/http.rs
Comment on lines +395 to 403
if tx.send(msg).await.is_err() {
return (
StatusCode::SERVICE_UNAVAILABLE,
StatusCode::INTERNAL_SERVER_ERROR,
Json(WebhookResponse {
message_id: msg_id,
status: "error".to_string(),
response: Some("Channel not started".to_string()),
response: Some("Channel closed".to_string()),
}),
);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tx.send(msg).await fails while wait_for_response is true, the function returns here before the cleanup at the end, leaving a stale entry in pending_responses. Ensure the map entry is removed on this error path as well (e.g., remove before returning, or use a scope guard so cleanup runs even on early returns).

Copilot uses AI. Check for mistakes.
Comment thread src/channels/http.rs
Comment on lines +563 to +571
// Fill all 256 slots in the channel buffer
{
let tx = {
let guard = channel.state.tx.read().await;
guard.as_ref().unwrap().clone()
};
for i in 0..256 {
let msg = IncomingMessage::new("http", "user", format!("fill-{}", i));
tx.send(msg).await.unwrap();
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression test hard-codes the channel buffer size (256) in the fill loop, which will silently become incorrect if the buffer capacity changes in HttpChannel::start() (and the test may stop actually blocking on the 257th send). Consider extracting the capacity into a const used by both start() and this test to keep them in sync.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +17
- name: Fetch base branch
run: git fetch origin ${{ github.event.pull_request.base.ref }}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/title focus on the CI workflow change, but this PR also includes substantial runtime code changes across multiple channel implementations plus a new regression test. Please update the PR description/title to reflect the broader scope, or split the Rust changes into a separate PR to keep review and rollout risk manageable.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: regular 2-5 merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/wasm WASM channel runtime scope: channel/web Web gateway channel scope: ci CI/CD workflows size: L 200-499 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants