Skip to content

fix(templates): coder template uses existing tests and edit_file#359

Merged
qhkm merged 2 commits intomainfrom
fix/coder-template-workflow
Mar 15, 2026
Merged

fix(templates): coder template uses existing tests and edit_file#359
qhkm merged 2 commits intomainfrom
fix/coder-template-workflow

Conversation

@qhkm
Copy link
Copy Markdown
Owner

@qhkm qhkm commented Mar 14, 2026

Summary

  • Updated coder template system prompt to instruct the LLM to check for existing test files before writing new ones
  • Added workflow: discover existing tests → run them first → fix in-place with edit_file → verify with existing suite → summarize changes
  • Added regression test test_coder_prompt_instructs_existing_test_workflow to guard the key instructions

Root cause

The coder template said "write tests when appropriate" but never told the LLM to look for existing tests first. This caused it to write its own tests matching buggy behavior, pass them, and report "no bugs found."

Test plan

  • cargo nextest run --lib -E 'test(templates)' — 32/32 pass
  • cargo clippy -- -D warnings — clean
  • cargo fmt -- --check — clean
  • Manual: zeptoclaw agent --template coder -m "Read test-coding-zepto/calculator.py, find bugs, fix them, verify with tests" should now use existing test_calculator.py

Closes #345

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Improvements
    • Enhanced built-in coder template with clearer, step-by-step guidance for bug-fix workflows: check existing tests, perform in-place edits, re-run test suites, and guidance on when to add new tests.
  • Tests
    • Added a unit test to verify the coder template includes the updated bug-fix workflow guidance.
  • Style
    • Minor whitespace/formatting cleanups in message parsing tests.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2d011293-e859-489d-b062-5afd12817be4

📥 Commits

Reviewing files that changed from the base of the PR and between 677d28d and 742178e.

📒 Files selected for processing (1)
  • src/channels/discord.rs
✅ Files skipped from review due to trivial changes (1)
  • src/channels/discord.rs

📝 Walkthrough

Walkthrough

Expanded the built-in coder template's system_prompt with explicit, structured bug-fix guidance (check existing tests, use edit_file for in-place edits, re-run existing test suites, write new tests only if none exist). Added a unit test validating those prompt instructions. Minor whitespace tweaks in a Discord parser test.

Changes

Cohort / File(s) Summary
Coder template & test
src/config/templates.rs
Expanded the coder template system_prompt with step-by-step bug-fix workflow: check for existing tests, run them first, prefer edit_file for in-place fixes, re-run existing test suite to verify, and only write new tests when no relevant tests exist. Added test_coder_prompt_instructs_existing_test_workflow to assert these instructions are present.
Discord channel whitespace
src/channels/discord.rs
Whitespace-only adjustments in parsing and corresponding test (test_message_create_with_image_only), no behavioral changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 I sniffed the prompt and hopped with glee,
"Use the tests that live," I whispered with tea,
Edit in place, then run them anew,
Only write tests when none exist true,
A tiny hop for a clearer dev spree. 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR contains whitespace-only changes to discord.rs unrelated to the template workflow fix objectives outlined in issue #345. Remove the whitespace-only changes in src/channels/discord.rs as they are out of scope and unrelated to the coder template fix.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the main change: updating the coder template to use existing tests and edit_file for bug fixing workflows.
Linked Issues check ✅ Passed All objectives from issue #345 are met: prompt instructs checking existing tests, using edit_file for in-place fixes, and re-running existing tests for verification.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/coder-template-workflow
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
src/r8r_bridge/health.rs (1)

27-37: Consider interval_at for deferred first tick.

time::interval() fires immediately on the first .tick(). If you want the first ping to occur after interval_secs rather than immediately on spawn, use time::interval_at(Instant::now() + duration, duration). This is likely acceptable as-is if an immediate health check is desired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/r8r_bridge/health.rs` around lines 27 - 37, The current tokio::spawn loop
uses time::interval which causes the first interval.tick().await to fire
immediately; to defer the first health ping use time::interval_at(Instant::now()
+ Duration::from_secs(interval_secs), Duration::from_secs(interval_secs))
instead of time::interval so the first send_health_ping() is invoked after
interval_secs; update the spawned task in health.rs (the async block that
references interval_secs, bridge.is_connected(), and bridge.send_health_ping())
to construct the interval with interval_at and the appropriate Instant and
Duration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config/types.rs`:
- Around line 135-137: The KNOWN_TOP_LEVEL constant in validate.rs is missing
the top-level field r8r_bridge; update the KNOWN_TOP_LEVEL collection (symbol:
KNOWN_TOP_LEVEL) to include "r8r_bridge" so the R8rBridgeConfig declared as
r8r_bridge in types.rs is validated properly, and then review src/config/mod.rs
(where env overrides are defined) to decide if you need to add environment
overrides for r8r_bridge (e.g., ZEPTOCLAW_R8R_BRIDGE_*) and add them if
required.

In `@src/r8r_bridge/mod.rs`:
- Around line 263-269: disconnect() currently only clears the public sender and
flips connected to false but does not close the underlying WebSocket or stop the
spawned reader task; locate the bridge struct fields (sender, connected,
ws_read, ws_write, and any task handles like reader_handle or shutdown_tx) and
update disconnect() to: signal/trigger shutdown to the reader task (e.g., via a
shutdown channel or by aborting the task handle), await or join the
reader_handle if available, and explicitly close/ shutdown both ws_write and
ws_read (or call the WebSocket close procedure) before setting sender to None
and connected to false so the old socket is fully torn down prior to logging
disconnection.
- Around line 187-205: The duplicate-check logic skips duplicates before sending
an Ack; change it so you still send an Ack for duplicates: call
dedup.lock().await and read dd.is_new(&envelope.id) into a local boolean, then
drop the dedup lock (exit that scope) and if the event is duplicate construct
Ack and use serde_json::to_string + ws_write.lock().await to send
WsMessage::Text via writer.send, then continue; if it is new, proceed with
normal processing. Ensure you do not hold the dedup lock while awaiting
ws_write.lock().await to avoid deadlocks (use a local flag, not the locked
guard, when sending the ack).
- Around line 110-118: The code currently calls unwrap() when parsing header
values for the Host and Authorization headers (the insert calls on
request.headers_mut() using host_with_port.parse().unwrap() and format!("Bearer
{token}").parse().unwrap()); replace these unwraps with proper error propagation
by attempting to parse the header values (e.g., via HeaderValue::from_str or
.parse()) and mapping any parsing error into the function's Result<(), String>
(e.g., map_err(|e| format!("invalid Host header: {}", e)) and similar for the
Authorization header using self.token), then insert the resulting HeaderValue
into request.headers_mut() so parsing failures return Err(...) instead of
panicking.

In `@tests/r8r_bridge_client_test.rs`:
- Around line 92-110: Replace the fixed
tokio::time::sleep(Duration::from_millis(500)).await with a timeout-based
polling loop: use tokio::time::timeout(...) to bound the total wait and
repeatedly call bridge.last_health_status().await (with a short sleep between
polls) until it returns Some(BridgeEvent::HealthStatus { ... }) or the timeout
elapses; then assert on the returned version and uptime_secs. Ensure you import
and use tokio::time::timeout and keep the same assertion logic that checks the
BridgeEvent::HealthStatus variant and fields.

---

Nitpick comments:
In `@src/r8r_bridge/health.rs`:
- Around line 27-37: The current tokio::spawn loop uses time::interval which
causes the first interval.tick().await to fire immediately; to defer the first
health ping use time::interval_at(Instant::now() +
Duration::from_secs(interval_secs), Duration::from_secs(interval_secs)) instead
of time::interval so the first send_health_ping() is invoked after
interval_secs; update the spawned task in health.rs (the async block that
references interval_secs, bridge.is_connected(), and bridge.send_health_ping())
to construct the interval with interval_at and the appropriate Instant and
Duration.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7cc6a605-e6aa-493a-82fc-abd532f5face

📥 Commits

Reviewing files that changed from the base of the PR and between 2d39ea2 and 156fffa.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • Cargo.toml
  • src/cli/common.rs
  • src/config/mod.rs
  • src/config/templates.rs
  • src/config/types.rs
  • src/lib.rs
  • src/r8r_bridge/approval.rs
  • src/r8r_bridge/dedup.rs
  • src/r8r_bridge/events.rs
  • src/r8r_bridge/health.rs
  • src/r8r_bridge/mod.rs
  • tests/r8r_bridge_approval_test.rs
  • tests/r8r_bridge_client_test.rs
  • tests/r8r_bridge_events_test.rs

Comment thread src/config/types.rs Outdated
Comment on lines +135 to +137
/// r8r workflow-engine bridge configuration.
#[serde(default)]
pub r8r_bridge: R8rBridgeConfig,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf "== src/config/mod.rs ==\n"
rg -n "r8r_bridge|system_prompt|ZEPTOCLAW_AGENTS_DEFAULTS_SYSTEM_PROMPT|R8R_BRIDGE" src/config/mod.rs || true

printf "\n== src/config/validate.rs ==\n"
rg -n "KNOWN_TOP_LEVEL|r8r_bridge" src/config/validate.rs || true

Repository: qhkm/zeptoclaw

Length of output: 643


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf "== src/config/validate.rs: KNOWN_TOP_LEVEL content ==\n"
sed -n '7,30p' src/config/validate.rs

printf "\n== src/config/mod.rs: env override section (around line 142) ==\n"
sed -n '135,160p' src/config/mod.rs

printf "\n== Check if r8r_bridge appears anywhere in src/config/mod.rs ==\n"
rg -i "r8r" src/config/mod.rs || echo "No r8r_bridge found in mod.rs"

Repository: qhkm/zeptoclaw

Length of output: 2005


Add r8r_bridge to KNOWN_TOP_LEVEL in src/config/validate.rs.

The env override for system_prompt is correctly in place at src/config/mod.rs:142-143, but r8r_bridge is missing from the KNOWN_TOP_LEVEL list in src/config/validate.rs. Since r8r_bridge is a top-level config field, it must be registered there for proper validation. Consider also whether an env override (e.g., ZEPTOCLAW_R8R_BRIDGE_*) is needed in src/config/mod.rs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/types.rs` around lines 135 - 137, The KNOWN_TOP_LEVEL constant in
validate.rs is missing the top-level field r8r_bridge; update the
KNOWN_TOP_LEVEL collection (symbol: KNOWN_TOP_LEVEL) to include "r8r_bridge" so
the R8rBridgeConfig declared as r8r_bridge in types.rs is validated properly,
and then review src/config/mod.rs (where env overrides are defined) to decide if
you need to add environment overrides for r8r_bridge (e.g.,
ZEPTOCLAW_R8R_BRIDGE_*) and add them if required.

Comment thread src/r8r_bridge/mod.rs Outdated
Comment on lines +110 to +118
request
.headers_mut()
.insert("Host", host_with_port.parse().unwrap());

if let Some(ref token) = self.token {
request.headers_mut().insert(
"Authorization",
format!("Bearer {token}").parse().unwrap(),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and read the exact file and lines
head -n 120 src/r8r_bridge/mod.rs | tail -n 20

Repository: qhkm/zeptoclaw

Length of output: 603


🏁 Script executed:

# Get the full context around lines 110-118
sed -n '100,130p' src/r8r_bridge/mod.rs

Repository: qhkm/zeptoclaw

Length of output: 1007


🏁 Script executed:

# Check the function signature to understand return type
sed -n '90,120p' src/r8r_bridge/mod.rs

Repository: qhkm/zeptoclaw

Length of output: 1102


🏁 Script executed:

# Verify the function signature and return type
grep -n "pub async fn connect" src/r8r_bridge/mod.rs

Repository: qhkm/zeptoclaw

Length of output: 116


🏁 Script executed:

# Check what type .parse() is parsing into by looking at HeaderMap usage
rg "headers_mut.*insert" -A 1 -B 1 src/r8r_bridge/mod.rs

Repository: qhkm/zeptoclaw

Length of output: 178


🏁 Script executed:

# Check if there are any tests or callers that show what values token and host could have
rg "R8rBridge::new|R8rBridge.*token" -A 2 -B 2 src/

Repository: qhkm/zeptoclaw

Length of output: 40


🏁 Script executed:

# Verify the http crate's HeaderValue to understand parse() failure modes
rg "use.*HeaderValue|use.*http" src/r8r_bridge/mod.rs

Repository: qhkm/zeptoclaw

Length of output: 40


🏁 Script executed:

# Check the imports and type signatures in the file
head -n 50 src/r8r_bridge/mod.rs | grep -E "^use|^impl"

Repository: qhkm/zeptoclaw

Length of output: 378


🏁 Script executed:

# Look for the full module context to understand what types are involved
sed -n '1,80p' src/r8r_bridge/mod.rs

Repository: qhkm/zeptoclaw

Length of output: 2939


🏁 Script executed:

# Check the http crate documentation to understand HeaderValue::parse() behavior
# by looking for how it's used elsewhere in the codebase
rg "HeaderValue" src/

Repository: qhkm/zeptoclaw

Length of output: 463


🏁 Script executed:

# Check the tungstenite ClientRequest and HeaderMap to understand what type parse() targets
rg "ClientRequest|headers_mut" -A 2 src/r8r_bridge/mod.rs | head -20

Repository: qhkm/zeptoclaw

Length of output: 436


🏁 Script executed:

# Look for any error handling patterns with HeaderValue in the codebase
rg "HeaderValue::from_str|parse.*HeaderValue" -B 2 -A 2 src/

Repository: qhkm/zeptoclaw

Length of output: 1370


🏁 Script executed:

# Check if there are any tests or examples showing R8rBridge with various token values
fd "test" src/ --type f -x grep -l "R8rBridge\|connect" {} \;

Repository: qhkm/zeptoclaw

Length of output: 40


🏁 Script executed:

# Verify: can a valid URL host and port actually produce an invalid HTTP header?
# Check the http crate's HeaderValue validation rules by looking at documentation comments
python3 << 'EOF'
# HeaderValue from http crate must contain only ASCII and specific chars
# Let's check what characters are valid in HTTP header values
import re

# HTTP header values must be VCHAR (33-126) or SPACE (32) or HTAB (9)
def is_valid_header_value(s):
    for c in s:
        code = ord(c)
        if not (code in [9, 32] or (33 <= code <= 126)):
            return False
    return True

# Test with a URL host that might be problematic
test_cases = [
    "example.com:8080",
    "localhost:9000",
    "192.168.1.1:443",
    "example.com",  # no port
    "example-with-dashes.com:8080",
    "example.com:invalid",  # should fail URL parsing first
]

for case in test_cases:
    print(f"{case}: valid_header={is_valid_header_value(case)}")
EOF

Repository: qhkm/zeptoclaw

Length of output: 280


🏁 Script executed:

# Check if there are any actual test cases with malformed tokens
rg "token.*=" tests/ --type rs -A 2 -B 2

Repository: qhkm/zeptoclaw

Length of output: 84


🏁 Script executed:

# Verify the exact type that .parse() is calling - check tungstenite's implementation
# by looking at what types the headers_mut().insert expects
python3 << 'EOF'
# Let's verify what the tungstenite library expects
# The insert method signature is: insert(K, V) where V typically needs to be HeaderValue
# HeaderValue::from_str() can fail if the string contains invalid characters

# According to HTTP spec (RFC 7230), header field values are:
# field-value = *( field-content / obs-fold )
# field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
# field-vchar = VCHAR / obs-text

# Let's simulate what could fail
test_tokens = [
    "Bearer validtoken123",  # valid
    "Bearer token\nwith\nnewlines",  # INVALID - contains newlines
    "Bearer token\x00null",  # INVALID - null bytes
    "Bearer token with spaces",  # valid
    "Bearer token™",  # potentially invalid - non-ASCII
]

for token in test_tokens:
    is_ascii = all(ord(c) < 128 for c in token)
    has_control = any(ord(c) < 32 for c in token)
    print(f"{token!r}: ascii_only={is_ascii}, has_control_chars={has_control}")
EOF

Repository: qhkm/zeptoclaw

Length of output: 386


🏁 Script executed:

# Most importantly: check what the coding guideline says about unwrap
grep -n "unwrap\|expect" src/r8r_bridge/mod.rs | head -20

Repository: qhkm/zeptoclaw

Length of output: 187


🏁 Script executed:

# Verify if there are other examples in r8r_bridge that handle Results properly
sed -n '90,300p' src/r8r_bridge/mod.rs | grep -E "\.map_err|Result|return Err"

Repository: qhkm/zeptoclaw

Length of output: 613


Remove unwrap() calls and handle header parsing failures gracefully.

The function returns Result<(), String> and already propagates errors consistently throughout, but lines 112 and 117 panic on invalid header values instead. The bearer token in particular is user-provided at runtime and could contain control characters or non-ASCII bytes, causing HeaderValue::parse() to fail. Propagate these errors instead of unwrapping.

Suggested change
-        request
-            .headers_mut()
-            .insert("Host", host_with_port.parse().unwrap());
+        let host_header = host_with_port
+            .parse()
+            .map_err(|e| format!("Invalid Host header value: {e}"))?;
+        request.headers_mut().insert("Host", host_header);
 
         if let Some(ref token) = self.token {
+            let auth_header = format!("Bearer {token}")
+                .parse()
+                .map_err(|e| format!("Invalid Authorization header value: {e}"))?;
             request.headers_mut().insert(
                 "Authorization",
-                format!("Bearer {token}").parse().unwrap(),
+                auth_header,
             );
         }

Per coding guidelines: "Do not add unwrap()/expect() in production paths unless failure is truly unrecoverable."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/r8r_bridge/mod.rs` around lines 110 - 118, The code currently calls
unwrap() when parsing header values for the Host and Authorization headers (the
insert calls on request.headers_mut() using host_with_port.parse().unwrap() and
format!("Bearer {token}").parse().unwrap()); replace these unwraps with proper
error propagation by attempting to parse the header values (e.g., via
HeaderValue::from_str or .parse()) and mapping any parsing error into the
function's Result<(), String> (e.g., map_err(|e| format!("invalid Host header:
{}", e)) and similar for the Authorization header using self.token), then insert
the resulting HeaderValue into request.headers_mut() so parsing failures return
Err(...) instead of panicking.

Comment thread src/r8r_bridge/mod.rs Outdated
Comment on lines +187 to +205
// Dedup check.
{
let mut dd = dedup.lock().await;
if !dd.is_new(&envelope.id) {
info!("r8r bridge: skipping duplicate event {}", envelope.id);
continue;
}
}

// Send ack.
let ack = Ack {
event_id: envelope.id.clone(),
};
if let Ok(ack_json) = serde_json::to_string(&ack) {
let mut writer = ws_write.lock().await;
if let Err(e) = writer.send(WsMessage::Text(ack_json.into())).await {
warn!("r8r bridge: failed to send ack: {e}");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ack duplicates before you skip them.

The duplicate check currently continues before the ack path. If r8r retransmits because the first ack was lost, this client will keep dropping the retry without ever acknowledging it, which can trap the sender in a retry loop.

🔁 Suggested change
-                // Dedup check.
-                {
-                    let mut dd = dedup.lock().await;
-                    if !dd.is_new(&envelope.id) {
-                        info!("r8r bridge: skipping duplicate event {}", envelope.id);
-                        continue;
-                    }
-                }
-
                 // Send ack.
                 let ack = Ack {
                     event_id: envelope.id.clone(),
                 };
                 if let Ok(ack_json) = serde_json::to_string(&ack) {
                     let mut writer = ws_write.lock().await;
                     if let Err(e) = writer.send(WsMessage::Text(ack_json.into())).await {
                         warn!("r8r bridge: failed to send ack: {e}");
                     }
                 }
+
+                // Dedup check.
+                {
+                    let mut dd = dedup.lock().await;
+                    if !dd.is_new(&envelope.id) {
+                        info!("r8r bridge: skipping duplicate event {}", envelope.id);
+                        continue;
+                    }
+                }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Dedup check.
{
let mut dd = dedup.lock().await;
if !dd.is_new(&envelope.id) {
info!("r8r bridge: skipping duplicate event {}", envelope.id);
continue;
}
}
// Send ack.
let ack = Ack {
event_id: envelope.id.clone(),
};
if let Ok(ack_json) = serde_json::to_string(&ack) {
let mut writer = ws_write.lock().await;
if let Err(e) = writer.send(WsMessage::Text(ack_json.into())).await {
warn!("r8r bridge: failed to send ack: {e}");
}
}
// Send ack.
let ack = Ack {
event_id: envelope.id.clone(),
};
if let Ok(ack_json) = serde_json::to_string(&ack) {
let mut writer = ws_write.lock().await;
if let Err(e) = writer.send(WsMessage::Text(ack_json.into())).await {
warn!("r8r bridge: failed to send ack: {e}");
}
}
// Dedup check.
{
let mut dd = dedup.lock().await;
if !dd.is_new(&envelope.id) {
info!("r8r bridge: skipping duplicate event {}", envelope.id);
continue;
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/r8r_bridge/mod.rs` around lines 187 - 205, The duplicate-check logic
skips duplicates before sending an Ack; change it so you still send an Ack for
duplicates: call dedup.lock().await and read dd.is_new(&envelope.id) into a
local boolean, then drop the dedup lock (exit that scope) and if the event is
duplicate construct Ack and use serde_json::to_string + ws_write.lock().await to
send WsMessage::Text via writer.send, then continue; if it is new, proceed with
normal processing. Ensure you do not hold the dedup lock while awaiting
ws_write.lock().await to avoid deadlocks (use a local flag, not the locked
guard, when sending the ack).

Comment thread src/r8r_bridge/mod.rs Outdated
Comment on lines +263 to +269
/// Disconnect from the bridge, clearing the sender and setting connected to false.
pub async fn disconnect(&self) {
let mut sender_guard = self.sender.lock().await;
*sender_guard = None;
self.connected.store(false, Ordering::Relaxed);
info!("r8r bridge disconnected (manual)");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

disconnect() doesn't actually tear down the WebSocket session.

This only clears the public sender and flips connected to false. The spawned reader task still owns ws_read, and the reader also retains ws_write for acks, so the old socket can stay alive while a later connect() creates a second session. Store shutdown/task handles and close the split stream here before reporting the bridge as disconnected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/r8r_bridge/mod.rs` around lines 263 - 269, disconnect() currently only
clears the public sender and flips connected to false but does not close the
underlying WebSocket or stop the spawned reader task; locate the bridge struct
fields (sender, connected, ws_read, ws_write, and any task handles like
reader_handle or shutdown_tx) and update disconnect() to: signal/trigger
shutdown to the reader task (e.g., via a shutdown channel or by aborting the
task handle), await or join the reader_handle if available, and explicitly
close/ shutdown both ws_write and ws_read (or call the WebSocket close
procedure) before setting sender to None and connected to false so the old
socket is fully torn down prior to logging disconnection.

Comment thread tests/r8r_bridge_client_test.rs Outdated
Comment on lines +92 to +110
// Wait briefly for the bridge to receive the health status response.
tokio::time::sleep(Duration::from_millis(500)).await;

// The mock server echoes a health status; we may also receive the ack first.
// Drain the ack message from rx (the bridge sends an ack for the health status).
// The health status should now be stored.
let health = bridge.last_health_status().await;
assert!(health.is_some(), "bridge should have stored health status");

if let Some(BridgeEvent::HealthStatus {
version,
uptime_secs,
..
}) = health
{
assert_eq!(version, "0.1.0-test");
assert_eq!(uptime_secs, 42);
} else {
panic!("Expected HealthStatus variant");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "r8r_bridge_client_test.rs" -type f

Repository: qhkm/zeptoclaw

Length of output: 91


🏁 Script executed:

sed -n '92,110p' ./tests/r8r_bridge_client_test.rs

Repository: qhkm/zeptoclaw

Length of output: 789


🏁 Script executed:

sed -n '50,115p' ./tests/r8r_bridge_client_test.rs

Repository: qhkm/zeptoclaw

Length of output: 2236


🏁 Script executed:

rg "last_health_status" ./tests/r8r_bridge_client_test.rs

Repository: qhkm/zeptoclaw

Length of output: 109


🏁 Script executed:

rg "fn last_health_status" -A 5

Repository: qhkm/zeptoclaw

Length of output: 391


🏁 Script executed:

rg "tokio::time::sleep" ./tests/r8r_bridge_client_test.rs

Repository: qhkm/zeptoclaw

Length of output: 115


Replace the fixed sleep with a timeout-based polling loop.

Lines 93–100 use an arbitrary 500ms sleep before checking the health status. This makes the test timing-sensitive and flaky on slower CI runners. Instead, poll last_health_status() within a timeout() (consistent with the timeout pattern already used at line 82 for the ping receipt) so the assertion waits for the actual state change.

Suggested change
-    // Wait briefly for the bridge to receive the health status response.
-    tokio::time::sleep(Duration::from_millis(500)).await;
-
-    // The mock server echoes a health status; we may also receive the ack first.
-    // Drain the ack message from rx (the bridge sends an ack for the health status).
-    // The health status should now be stored.
-    let health = bridge.last_health_status().await;
-    assert!(health.is_some(), "bridge should have stored health status");
-
-    if let Some(BridgeEvent::HealthStatus {
+    let health = tokio::time::timeout(Duration::from_secs(5), async {
+        loop {
+            if let Some(health) = bridge.last_health_status().await {
+                break health;
+            }
+            tokio::time::sleep(Duration::from_millis(50)).await;
+        }
+    })
+    .await
+    .expect("bridge should receive health status");
+
+    if let BridgeEvent::HealthStatus {
         version,
         uptime_secs,
         ..
-    }) = health
+    } = health
     {
         assert_eq!(version, "0.1.0-test");
         assert_eq!(uptime_secs, 42);
     } else {
         panic!("Expected HealthStatus variant");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Wait briefly for the bridge to receive the health status response.
tokio::time::sleep(Duration::from_millis(500)).await;
// The mock server echoes a health status; we may also receive the ack first.
// Drain the ack message from rx (the bridge sends an ack for the health status).
// The health status should now be stored.
let health = bridge.last_health_status().await;
assert!(health.is_some(), "bridge should have stored health status");
if let Some(BridgeEvent::HealthStatus {
version,
uptime_secs,
..
}) = health
{
assert_eq!(version, "0.1.0-test");
assert_eq!(uptime_secs, 42);
} else {
panic!("Expected HealthStatus variant");
let health = tokio::time::timeout(Duration::from_secs(5), async {
loop {
if let Some(health) = bridge.last_health_status().await {
break health;
}
tokio::time::sleep(Duration::from_millis(50)).await;
}
})
.await
.expect("bridge should receive health status");
if let BridgeEvent::HealthStatus {
version,
uptime_secs,
..
} = health
{
assert_eq!(version, "0.1.0-test");
assert_eq!(uptime_secs, 42);
} else {
panic!("Expected HealthStatus variant");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/r8r_bridge_client_test.rs` around lines 92 - 110, Replace the fixed
tokio::time::sleep(Duration::from_millis(500)).await with a timeout-based
polling loop: use tokio::time::timeout(...) to bound the total wait and
repeatedly call bridge.last_health_status().await (with a short sleep between
polls) until it returns Some(BridgeEvent::HealthStatus { ... }) or the timeout
elapses; then assert on the returned version and uptime_secs. Ensure you import
and use tokio::time::timeout and keep the same assertion logic that checks the
BridgeEvent::HealthStatus variant and fields.

The coder template system prompt lacked instructions to check for
existing test files before writing new ones, causing the LLM to write
its own tests that matched buggy behavior and report false success.

Added a 5-step workflow: discover existing tests, run them first, fix
in-place with edit_file, verify with existing suite, summarize changes.

Closes #345

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qhkm qhkm force-pushed the fix/coder-template-workflow branch from 156fffa to 677d28d Compare March 14, 2026 18:02
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config/templates.rs`:
- Around line 440-465: The test
test_coder_prompt_instructs_existing_test_workflow should also assert that the
coder.system_prompt enforces running existing tests first; update the test
(using the prompt variable already defined) to include an assertion that
prompt.contains("run the existing tests first") or a similar phrase that
includes both "run" and "first" (e.g., "run tests first" / "run the existing
tests first") so the contract that tests must be executed before editing is
explicitly locked down.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 58553670-3d96-4603-9ae1-5e1a12afcc45

📥 Commits

Reviewing files that changed from the base of the PR and between 156fffa and 677d28d.

📒 Files selected for processing (1)
  • src/config/templates.rs

Comment thread src/config/templates.rs
Comment on lines +440 to +465
#[test]
fn test_coder_prompt_instructs_existing_test_workflow() {
let registry = TemplateRegistry::new();
let coder = registry.get("coder").unwrap();
let prompt = &coder.system_prompt;
// Must instruct to check for existing tests before writing new ones
assert!(
prompt.contains("existing test"),
"coder prompt must mention existing tests"
);
// Must instruct to use edit_file for in-place fixes
assert!(
prompt.contains("edit_file"),
"coder prompt must instruct using edit_file"
);
// Must instruct to run existing tests for verification
assert!(
prompt.contains("re-running the EXISTING test suite"),
"coder prompt must instruct re-running existing tests"
);
// Must NOT encourage writing own tests as default verification
assert!(
prompt.contains("Only write new tests if no relevant tests exist"),
"coder prompt must discourage writing new tests when existing ones are available"
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Regression test should also lock the “run tests first” contract.

At Line 445-Line 464, the assertions cover existing-tests usage and re-run behavior, but they don’t explicitly guard the “run them FIRST” requirement.

Suggested patch
 #[test]
 fn test_coder_prompt_instructs_existing_test_workflow() {
     let registry = TemplateRegistry::new();
     let coder = registry.get("coder").unwrap();
     let prompt = &coder.system_prompt;
@@
     assert!(
         prompt.contains("Only write new tests if no relevant tests exist"),
         "coder prompt must discourage writing new tests when existing ones are available"
     );
+    // Must instruct to run existing tests first before editing
+    assert!(
+        prompt.contains("run them FIRST"),
+        "coder prompt must require running existing tests first"
+    );
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/templates.rs` around lines 440 - 465, The test
test_coder_prompt_instructs_existing_test_workflow should also assert that the
coder.system_prompt enforces running existing tests first; update the test
(using the prompt variable already defined) to include an assertion that
prompt.contains("run the existing tests first") or a similar phrase that
includes both "run" and "first" (e.g., "run tests first" / "run the existing
tests first") so the contract that tests must be executed before editing is
explicitly locked down.

@qhkm qhkm merged commit 624ad11 into main Mar 15, 2026
9 checks passed
@qhkm qhkm deleted the fix/coder-template-workflow branch March 15, 2026 03:41
taqtiqa-mark pushed a commit to taqtiqa-mark/zeptoclaw that referenced this pull request Mar 25, 2026
…m#359)

## Summary
- Updated coder template system prompt to instruct the LLM to check for
existing test files before writing new ones
- Added workflow: discover existing tests → run them first → fix
in-place with `edit_file` → verify with existing suite → summarize
changes
- Added regression test
`test_coder_prompt_instructs_existing_test_workflow` to guard the key
instructions

## Root cause
The coder template said "write tests when appropriate" but never told
the LLM to look for *existing* tests first. This caused it to write its
own tests matching buggy behavior, pass them, and report "no bugs
found."

## Test plan
- [x] `cargo nextest run --lib -E 'test(templates)'` — 32/32 pass
- [x] `cargo clippy -- -D warnings` — clean
- [x] `cargo fmt -- --check` — clean
- [ ] Manual: `zeptoclaw agent --template coder -m "Read
test-coding-zepto/calculator.py, find bugs, fix them, verify with
tests"` should now use existing `test_calculator.py`

Closes qhkm#345

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Improvements**
* Enhanced built-in coder template with clearer, step-by-step guidance
for bug-fix workflows: check existing tests, perform in-place edits,
re-run test suites, and guidance on when to add new tests.
* **Tests**
* Added a unit test to verify the coder template includes the updated
bug-fix workflow guidance.
* **Style**
  * Minor whitespace/formatting cleanups in message parsing tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: coder template LLM writes own tests instead of using existing ones

1 participant