Skip to content

Improve test infrastructure: StubChannel, gateway helpers, security tests, search edge cases#623

Merged
ilblackdragon merged 18 commits intomainfrom
do-some-research-into-openclaw
Mar 7, 2026
Merged

Improve test infrastructure: StubChannel, gateway helpers, security tests, search edge cases#623
ilblackdragon merged 18 commits intomainfrom
do-some-research-into-openclaw

Conversation

@zmanian
Copy link
Copy Markdown
Collaborator

@zmanian zmanian commented Mar 6, 2026

Summary

Implements testing improvements inspired by OpenClaw's test architecture (see #466).

Batch 1: StubChannel + Test Tier Separation

  • StubChannel test double for the Channel trait (mpsc-based message injection, response capture)
  • Wired into TestHarnessBuilder for easy test setup
  • 6 ChannelManager unit tests using StubChannel (first-ever tests for this module)
  • Integration tests gated behind integration feature flag, replacing silent try_connect() skip pattern
  • Test tier documentation added to CLAUDE.md

Batch 2: Gateway Helpers, Security Tests, Search Edge Cases

  • TestGatewayBuilder eliminates 19-field GatewayState struct duplication across 2 integration test files
  • 11 security regression tests for skill installer: ZIP extraction safety (path traversal, nested paths, oversized entries) and SSRF prevention (private IPs, loopback, link-local, IPv4-mapped IPv6, restricted hosts)
  • 8 RRF search edge case tests: empty inputs, single-method results, duplicate chunk merging, limit=0, min_score filtering, config mode builders
  • Architecture boundary check script (scripts/check-boundaries.sh): detects direct DB driver usage outside db layer, unwrap/expect in production, env var reads outside config

Batch 3: SSRF Fix, WASM Loader Tests, Skills Limits Tests

  • IPv6 SSRF bypass fix: validate_fetch_url() used host_str() which returns bracketed IPv6 that IpAddr::parse() cannot handle. Switched to url::Host enum matching for correct IP extraction
  • 6 WASM loader security tests: tool name path separator rejection, empty name rejection, nonexistent file handling, invalid WASM bytes rejection, dotfile discovery, subdirectory non-recursion
  • Activation criteria limits test: verifies enforce_limits() trims excess patterns (>5), keywords (>20), and filters short keywords/tags (<3 chars)

Test plan

  • cargo test --lib -- all 1882 tests pass (up from 1848)
  • cargo clippy --all --tests -- -D warnings -- zero warnings
  • cargo test --test ws_gateway_integration -- 10 tests pass
  • cargo test --test openai_compat_integration -- 16 tests pass
  • bash scripts/check-boundaries.sh -- runs, reports known violations as warnings

Generated with Claude Code

zmanian and others added 16 commits March 3, 2026 17:09
Adds StubChannel to src/testing.rs alongside StubLlm. Supports message
injection via mpsc sender, response/status capture, and configurable
health check toggling. Includes handle methods for use after ownership
transfer to ChannelManager.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add with_stub_channel() builder method that creates a StubChannel
pre-registered in a ChannelManager. Tests can inject messages via
the sender and verify routing through the manager. The channel field
on TestHarness is Optional, defaulting to None for backward compat.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace silent try_connect() skip pattern with explicit feature gating.
cargo test now runs only self-contained tests.
cargo test --features integration runs tests requiring PostgreSQL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover add/start_all stream merging, respond routing, unknown channel
errors, health_check_all with mixed health, empty-channels error path,
and injection channel merging -- all via StubChannel test double.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Grep-based checks for three architecture boundaries:
- Direct database driver usage (tokio_postgres/libsql) outside src/db/
- .unwrap()/.expect() in production code (warning only)
- Direct std::env::var reads outside config layer (warning only)

The DB driver check is a hard violation; the other two are warnings
for gradual cleanup. Run with: bash scripts/check-boundaries.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfig modes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… protections

Add 11 regression tests covering the security controls in skill_tools:

ZIP extraction safety:
- Valid SKILL.md extraction works correctly
- Non-SKILL.md entries are ignored (returns error)
- Path traversal entries (../../SKILL.md) do not match
- Nested path entries (subdir/SKILL.md) do not match
- Oversized entries (>1MB uncompressed) are rejected

SSRF prevention:
- Loopback addresses (127.0.0.1) are blocked
- Private ranges (10.x, 172.16.x, 192.168.x) are blocked
- Link-local addresses (169.254.x) are blocked
- Public IPs (8.8.8.8, 1.1.1.1) are allowed
- IPv4-mapped IPv6 unwrapping logic works correctly
- Metadata endpoints and .internal/.local hostnames are blocked
- Normal hostnames (github.com, clawhub.dev) are allowed

Also documents a known gap: url::Url::host_str() returns bracketed
IPv6 addresses that std::net::IpAddr cannot parse, so IPv4-mapped
IPv6 URLs currently bypass IP-based checks in validate_fetch_url.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…st duplication

Both ws_gateway_integration.rs and openai_compat_integration.rs manually
constructed GatewayState with 19+ fields. Extracted to a shared builder in
src/channels/web/test_helpers.rs that provides sensible defaults and lets
tests override only what they need.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
validate_fetch_url used host_str() which returns bracketed IPv6
(e.g. "[::ffff:7f00:1]") that IpAddr::parse() cannot handle,
silently skipping IP-based SSRF checks for all IPv6 URLs.

Switch to url::Host enum matching to extract proper IpAddr values
without string parsing. IPv4-mapped IPv6 addresses like
::ffff:127.0.0.1 are now correctly unwrapped and blocked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds test_activation_criteria_enforce_limits to verify that
enforce_limits() correctly trims excess patterns (>5), keywords (>20),
and tags (>10), and filters out short keywords/tags (<3 chars).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 6 tests covering: tool name path separator rejection, empty name
rejection, nonexistent file handling, invalid WASM bytes rejection,
dotfile discovery behavior, and subdirectory non-recursion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove plan files from repo (ilblackdragon review)
- Replace CLAUDE.md test tier rules with pointer to check-boundaries.sh
- Add Check 4 to check-boundaries.sh: enforces integration tests are
  gated behind the 'integration' feature flag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check 5 catches try_connect() and similar silent-skip patterns in
integration tests. Tests should use feature gates to fail loudly
when prerequisites are missing, not silently return.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge origin/main into the PR branch, resolving merge conflicts in
src/testing.rs by keeping both the branch's StubChannel/harness tests
and main's database CRUD coverage tests.

Fix the Tests (all-features) CI failure caused by workspace_integration
tests trying to connect to PostgreSQL when none is available. The
--all-features flag enables the integration feature, which gates these
tests, but CI has no PostgreSQL service. Changed the CI test matrix to
use explicit feature flags instead of --all-features, excluding the
integration feature (which requires a running database).

Also applied cargo fmt to files touched by the merge.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added scope: channel Channel infrastructure scope: channel/web Web gateway channel scope: tool/builtin Built-in tools scope: tool/wasm WASM tool sandbox scope: workspace Persistent memory / workspace scope: ci CI/CD workflows scope: docs Documentation scope: dependencies Dependency updates size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 6, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's testing capabilities and security posture. It introduces new test doubles and builders to simplify writing robust tests, particularly for channel management and web gateway interactions. A new script enforces architectural boundaries, promoting code quality and maintainability. Crucially, it addresses several security vulnerabilities and edge cases through dedicated regression tests for skill installation, URL fetching (SSRF), and WASM tool loading, ensuring the system is more resilient against malicious inputs and configurations. The changes also improve the reliability of search functionality by covering various edge cases in reciprocal rank fusion.

Highlights

  • Testing Infrastructure: Introduced a StubChannel test double for the Channel trait, enabling easier message injection and response capture in tests. This is integrated into TestHarnessBuilder for streamlined test setup.
  • ChannelManager Unit Tests: Added 6 new unit tests for ChannelManager, leveraging the StubChannel to provide comprehensive coverage for this module for the first time.
  • Integration Test Gating: Refactored integration tests to be gated behind an integration feature flag, replacing previous silent try_connect() skip patterns and improving test tier separation. Documentation for test tiers was added to CLAUDE.md.
  • Gateway Test Helpers: Created TestGatewayBuilder to simplify the setup of GatewayState in integration tests, reducing boilerplate and duplication across test files.
  • Security Regression Tests (Skill Installer): Implemented 11 new security regression tests for the skill installer, covering ZIP extraction safety (path traversal, nested paths, oversized entries) and SSRF prevention (private IPs, loopback, link-local, IPv4-mapped IPv6, restricted hosts).
  • RRF Search Edge Case Tests: Added 8 new tests for Reciprocal Rank Fusion (RRF) search, addressing edge cases such as empty inputs, single-method results, duplicate chunk merging, limit=0, min_score filtering, and config mode builders.
  • Architecture Boundary Check Script: Introduced a new scripts/check-boundaries.sh script to enforce architectural rules, detecting direct database driver usage outside the db layer, unwrap()/expect() in production code, and environment variable reads outside config/bootstrap layers.
  • IPv6 SSRF Bypass Fix: Fixed an IPv6 SSRF bypass vulnerability in validate_fetch_url() by switching to url::Host enum matching for correct IP extraction, addressing issues with bracketed IPv6 addresses.
  • WASM Loader Security Tests: Added 6 security tests for the WASM loader, verifying rejection of tool names with path separators, empty names, handling of nonexistent files, invalid WASM bytes, dotfile discovery, and subdirectory non-recursion.
  • Activation Criteria Limits Test: Included a test to verify that enforce_limits() correctly trims excess activation patterns, keywords, and filters short keywords/tags.
Changelog
  • CLAUDE.md
    • Updated documentation to include a new 'Test Tiers' section, detailing unit, integration, and live testing commands and dependencies.
    • Added instructions to run the new check-boundaries.sh script.
  • Cargo.lock
    • Updated the syn dependency from version 2.0.114 to 2.0.117.
  • scripts/check-boundaries.sh
    • Added a new shell script to perform architecture boundary checks, including direct database driver usage, unwrap()/expect() in production, direct environment variable reads, test tier gating, and silent test-skip patterns.
  • src/channels/manager.rs
    • Added a new test module with 6 unit tests for ChannelManager, covering adding channels, starting all channels, responding to channels, handling unknown channels, health checks, and message injection.
  • src/channels/web/mod.rs
    • Exported the new test_helpers module to provide shared utilities for gateway integration tests.
  • src/channels/web/test_helpers.rs
    • Added a new module containing TestGatewayBuilder, a builder for constructing GatewayState with sensible test defaults and methods to start a test server.
  • src/skills/mod.rs
    • Added a new test case test_activation_criteria_enforce_limits to verify that activation criteria (keywords, patterns, tags) are correctly limited and filtered based on predefined caps and minimum lengths.
  • src/testing.rs
    • Introduced StubChannel, a configurable test double for the Channel trait, allowing message injection and response/status capture.
    • Updated TestHarness to include an optional channel field for StubChannel and ChannelManager.
    • Modified TestHarnessBuilder with a with_stub_channel() method to easily integrate StubChannel into test harnesses.
    • Added new tests for StubChannel functionality (injecting messages, capturing responses, health checks) and for TestHarness integration with StubChannel.
  • src/tools/builtin/skill_tools.rs
    • Modified validate_fetch_url to use url::Host for more robust IP address parsing, specifically to correctly handle bracketed IPv6 addresses and prevent SSRF bypasses.
    • Updated the host string conversion for internal hostname checks to use host.to_string().to_lowercase().
    • Added extensive security regression tests for ZIP extraction, covering valid skill extraction, ignoring non-skill entries, rejecting path traversal, nested paths, and oversized entries.
    • Added comprehensive SSRF prevention regression tests, blocking loopback, private ranges, link-local, IPv4-mapped IPv6, and restricted internal hostnames, while allowing public IPs.
  • src/tools/wasm/loader.rs
    • Added a suite of security regression tests for the WASM tool loader, including checks for tool names with path separators, empty names, nonexistent WASM files, invalid WASM bytes, dotfile handling, and non-recursive subdirectory discovery.
  • src/workspace/search.rs
    • Added new edge case tests for the reciprocal rank fusion (RRF) search algorithm, covering scenarios like empty input lists, FTS-only or vector-only results, merging duplicate chunks, handling a limit of zero, and filtering by minimum score.
  • tests/heartbeat_integration.rs
    • Updated the cfg attribute to all(feature = "postgres", feature = "integration") to explicitly gate the integration test behind both PostgreSQL and the general integration feature flag.
  • tests/openai_compat_integration.rs
    • Refactored start_test_server_with_provider and the test_no_llm_provider_returns_503 test to utilize the new TestGatewayBuilder for creating GatewayState instances, reducing boilerplate.
  • tests/workspace_integration.rs
    • Updated the cfg attribute to all(feature = "postgres", feature = "integration") for all integration tests.
    • Removed the try_connect() helper function and its calls within tests, relying on the new feature flag gating for conditional test execution.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/test.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the test infrastructure with StubChannel and TestGatewayBuilder, and introduces a new test tiering system. It also enhances security by addressing an IPv6 SSRF bypass in validate_fetch_url and adding comprehensive regression tests for ZIP extraction and WASM loading. However, a high-severity vulnerability was identified: the SSRF protection remains incomplete as it does not account for domain names resolving to private IP addresses, posing a DNS-based SSRF bypass risk. Additionally, a suggestion was made to improve the robustness of the check-boundaries.sh script.

Comment thread src/tools/builtin/skill_tools.rs Outdated
Comment on lines +404 to +409
let raw_ip = match &host {
url::Host::Ipv4(v4) => Some(std::net::IpAddr::V4(*v4)),
url::Host::Ipv6(v6) => Some(std::net::IpAddr::V6(*v6)),
url::Host::Domain(_) => None,
};
if let Some(raw_ip) = raw_ip {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The validate_fetch_url function is vulnerable to SSRF via DNS resolution. While it correctly checks for private/loopback IP addresses when the host is provided as an IP, it skips these checks when a domain name is used (line 407). An attacker can bypass the SSRF protection by using a domain name that resolves to a private IP address (e.g., 127.0.0.1.nip.io).

To remediate this, you should resolve the domain name to its IP addresses and validate each IP against the restricted ranges before allowing the request to proceed. Alternatively, configure the HTTP client with a custom resolver or socket factory that enforces these restrictions at the connection level.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit ced6b96 (already on the branch before this review round). The build_safe_fetch_client function now performs async DNS resolution via tokio::net::lookup_host and validates all resolved IPs against private/loopback/link-local ranges through validate_resolved_addrs. This prevents DNS-based SSRF bypass via domains like 127.0.0.1.nip.io. The HTTP client is also configured with resolve_to_addrs to pin the connection to the validated IPs, preventing TOCTOU races between resolution and connection.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ced6b96. The skill fetch path now resolves hostnames before connecting, rejects any private/loopback/link-local result, and pins the validated addresses into reqwest with resolve_to_addrs so the request cannot re-resolve to a different target.

Comment thread scripts/check-boundaries.sh Outdated
Comment on lines +157 to +185
tier_violations=""
for test_file in tests/*.rs; do
[ -f "$test_file" ] || continue

# Check if the file actually connects to a database (imports DB types
# or calls pool/connect). Mere string references like "DATABASE_URL"
# in config tests don't count.
needs_gate=false
if grep -q 'PgPool\|tokio_postgres::\|create_pool\|\.connect(' "$test_file" 2>/dev/null; then
needs_gate=true
fi

if [ "$needs_gate" = true ]; then
# Check first 5 lines for the cfg gate
if ! head -5 "$test_file" | grep -q 'cfg.*feature.*integration' 2>/dev/null; then
tier_violations="$tier_violations\n $test_file: needs #![cfg(all(feature = \"postgres\", feature = \"integration\"))]"
fi
fi
done

if [ -n "$tier_violations" ]; then
echo "VIOLATION: Integration tests missing feature gate:"
echo -e "$tier_violations"
echo
echo "(Tests requiring external services must be gated behind the 'integration' feature)"
violations=$((violations + 1))
else
echo "OK"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current method of building the tier_violations string by concatenation is a bit fragile. It can lead to issues if filenames contain special characters, and echo -e has known portability and security concerns compared to printf. Using a bash array to collect violations and printf to print them would be more robust and safer.

Here's a suggested refactoring of this check:

tier_violations=()
for test_file in tests/*.rs; do
    [ -f "$test_file" ] || continue

    # Check if the file actually connects to a database (imports DB types
    # or calls pool/connect). Mere string references like "DATABASE_URL"
    # in config tests don't count.
    needs_gate=false
    if grep -q 'PgPool\|tokio_postgres::\|create_pool\|\.connect(' "$test_file" 2>/dev/null; then
        needs_gate=true
    fi

    if [ "$needs_gate" = true ]; then
        # Check first 5 lines for the cfg gate
        if ! head -5 "$test_file" | grep -q 'cfg.*feature.*integration' 2>/dev/null; then
            tier_violations+=("  $test_file: needs '#![cfg(all(feature = \"postgres\", feature = \"integration\"))]'")
        fi
    fi
done

if [ ${#tier_violations[@]} -gt 0 ]; then
    echo "VIOLATION: Integration tests missing feature gate:"
    printf '%s\n' "${tier_violations[@]}"
    echo
    echo "(Tests requiring external services must be gated behind the 'integration' feature)"
    violations=$((violations + 1))
else
    echo "OK"
fi

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ec6e328. Refactored Check 4 to use a bash array (tier_violations=()) and printf '%s\n' instead of string concatenation with echo -e, as suggested.

@ilblackdragon
Copy link
Copy Markdown
Member

Known gap: DNS-based SSRF bypass. The fix handles IP literals but a domain like evil.attacker.com that
DNS-resolves to 127.0.0.1 still bypasses all checks. This is out of scope for this PR, but worth a //
TODO: DNS rebinding protection comment at the top of validate_fetch_url so nobody assumes it's fully
hardened. The fix should happen at connect time (check the resolved IP before establishing the TCP
connection), not at URL validation time.

Code Issues

  1. test_helpers.rs ships in production builds. pub mod test_helpers; in src/channels/web/mod.rs has no
    cfg gate, so this test builder code compiles into release binaries. The comment explains why (tests/
    can't see #[cfg(test)] modules), but the standard pattern is a dedicated feature flag:
    #[cfg(any(test, feature = "test-helpers"))]
    pub mod test_helpers;
  2. Or gate it behind the existing integration feature since that's who uses it.
  3. use super::* in manager.rs tests (line 288). Project convention (CLAUDE.md) says prefer crate:: over
    super::. Should be specific imports like use crate::channels::manager::{ChannelManager,
    OutgoingResponse}; or at minimum use crate::channels::manager::*;.
  4. Hand-rolled ZIP entries with zeroed CRC32. The build_zip_entry_store helper writes 0x00000000 for
    CRC32. This works because the current ZIP parser doesn't verify checksums, but it's fragile. If someone
    swaps the ZIP library or enables CRC verification, all these tests break silently. Consider computing the
    actual CRC32 (it's a 4-line addition with the crc32fast crate, or just hardcode the known values for
    fixed test content).

zmanian and others added 2 commits March 6, 2026 18:39
…heck

Refactor Check 4 in check-boundaries.sh to use bash arrays and printf
instead of string concatenation with echo -e. This is more robust with
special characters in filenames and avoids portability concerns with
echo -e. [skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zmanian
Copy link
Copy Markdown
Collaborator Author

zmanian commented Mar 7, 2026

Review Feedback Addressed

Both review comments from gemini-code-assist have been addressed:

1. SSRF DNS bypass (High severity) - src/tools/builtin/skill_tools.rs

Already fixed in ced6b96 (prior commit on this branch). The build_safe_fetch_client function performs async DNS resolution via tokio::net::lookup_host, validates all resolved IPs through validate_resolved_addrs, and pins the HTTP client to validated addresses via resolve_to_addrs to prevent TOCTOU races.

2. Shell script robustness (Medium) - scripts/check-boundaries.sh

Fixed in ec6e328. Check 4's tier_violations variable refactored from string concatenation + echo -e to a bash array + printf '%s\n' for better portability and robustness with special characters.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Hi @zmanian,

Thank you for the update and for addressing the feedback! I appreciate you taking the time to implement these improvements.

Just a quick clarification: my initial comment (Comment 1) was a summary of the pull request's changes, not a review with specific issues. The detailed review comments you're referring to were provided by @ilblackdragon in Comment 2.

Regarding the points you've addressed:

  • DNS-based SSRF bypass (src/tools/builtin/skill_tools.rs): I've reviewed the changes, and the implementation of build_safe_fetch_client with async DNS resolution, validation of resolved IPs via validate_resolved_addrs, and pinning the HTTP client to validated addresses effectively addresses the "Known gap" raised by @ilblackdragon. This is a significant security improvement. Well done!
  • Shell script robustness (scripts/check-boundaries.sh): The refactoring of tier_violations to use a bash array and printf for better portability and robustness is a good enhancement.

However, there are a few other points from @ilblackdragon's review that still need attention:

  1. test_helpers.rs ships in production builds

    • Severity: high
    • Description: The pub mod test_helpers; declaration in src/channels/web/mod.rs lacks a #[cfg] gate, causing test helper code to be compiled into release binaries. This can lead to unnecessary code in production builds and potentially expose test-specific functionalities.
    • Suggestion: Gate the test_helpers module behind a feature flag or the existing integration feature.
    • Reference: src/channels/web/mod.rs (lines 29-32 in the AFTER patch)
  2. use super::* in manager.rs tests

    • Severity: low
    • Description: The test module in src/channels/manager.rs uses use super::*;. The project convention (as noted in CLAUDE.md) prefers explicit crate:: imports or specific use statements for better clarity and to avoid potential naming conflicts.
    • Suggestion: Replace use super::* with specific imports like use crate::channels::{IncomingMessage, OutgoingResponse}; and use crate::testing::StubChannel;.
    • Reference: src/channels/manager.rs (line 241 in the AFTER patch)
  3. Hand-rolled ZIP entries with zeroed CRC32

    • Severity: high
    • Description: The build_zip_entry_store helper in src/tools/builtin/skill_tools.rs uses a zeroed CRC32 value. While the current ZIP parser might not verify checksums, this makes the tests fragile. If the ZIP library changes or CRC verification is enabled in the future, these tests could silently break or pass incorrectly.
    • Suggestion: Consider computing the actual CRC32 for the content (e.g., using a crate like crc32fast) or hardcoding known correct CRC32 values for fixed test content to make the tests more robust.
    • Reference: src/tools/builtin/skill_tools.rs (lines 1012-1027 in the AFTER patch)

Please let me know if you have any questions about these remaining points.

@ilblackdragon ilblackdragon merged commit 45ec691 into main Mar 7, 2026
27 checks passed
@ilblackdragon ilblackdragon deleted the do-some-research-into-openclaw branch March 7, 2026 08:30
This was referenced Mar 7, 2026
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…ests, search edge cases (nearai#623)

* feat(testing): add StubChannel test double for Channel trait

Adds StubChannel to src/testing.rs alongside StubLlm. Supports message
injection via mpsc sender, response/status capture, and configurable
health check toggling. Includes handle methods for use after ownership
transfer to ChannelManager.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(testing): wire StubChannel into TestHarnessBuilder

Add with_stub_channel() builder method that creates a StubChannel
pre-registered in a ChannelManager. Tests can inject messages via
the sender and verify routing through the manager. The channel field
on TestHarness is Optional, defaulting to None for backward compat.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: gate external-service tests behind integration feature flag

Replace silent try_connect() skip pattern with explicit feature gating.
cargo test now runs only self-contained tests.
cargo test --features integration runs tests requiring PostgreSQL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(channels): add ChannelManager unit tests using StubChannel

Cover add/start_all stream merging, respond routing, unknown channel
errors, health_check_all with mixed health, empty-channels error path,
and injection channel merging -- all via StubChannel test double.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: document test tier separation (unit/integration/live)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add architecture boundary check script

Grep-based checks for three architecture boundaries:
- Direct database driver usage (tokio_postgres/libsql) outside src/db/
- .unwrap()/.expect() in production code (warning only)
- Direct std::env::var reads outside config layer (warning only)

The DB driver check is a hard violation; the other two are warnings
for gradual cleanup. Run with: bash scripts/check-boundaries.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(search): add RRF edge case tests for empty inputs, limits, and config modes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(security): add regression tests for skill installer ZIP and SSRF protections

Add 11 regression tests covering the security controls in skill_tools:

ZIP extraction safety:
- Valid SKILL.md extraction works correctly
- Non-SKILL.md entries are ignored (returns error)
- Path traversal entries (../../SKILL.md) do not match
- Nested path entries (subdir/SKILL.md) do not match
- Oversized entries (>1MB uncompressed) are rejected

SSRF prevention:
- Loopback addresses (127.0.0.1) are blocked
- Private ranges (10.x, 172.16.x, 192.168.x) are blocked
- Link-local addresses (169.254.x) are blocked
- Public IPs (8.8.8.8, 1.1.1.1) are allowed
- IPv4-mapped IPv6 unwrapping logic works correctly
- Metadata endpoints and .internal/.local hostnames are blocked
- Normal hostnames (github.com, clawhub.dev) are allowed

Also documents a known gap: url::Url::host_str() returns bracketed
IPv6 addresses that std::net::IpAddr cannot parse, so IPv4-mapped
IPv6 URLs currently bypass IP-based checks in validate_fetch_url.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(testing): extract TestGatewayBuilder to eliminate gateway test duplication

Both ws_gateway_integration.rs and openai_compat_integration.rs manually
constructed GatewayState with 19+ fields. Extracted to a shared builder in
src/channels/web/test_helpers.rs that provides sensible defaults and lets
tests override only what they need.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add implementation plans for testing batches 1 and 2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): close IPv6 SSRF bypass in validate_fetch_url

validate_fetch_url used host_str() which returns bracketed IPv6
(e.g. "[::ffff:7f00:1]") that IpAddr::parse() cannot handle,
silently skipping IP-based SSRF checks for all IPv6 URLs.

Switch to url::Host enum matching to extract proper IpAddr values
without string parsing. IPv4-mapped IPv6 addresses like
::ffff:127.0.0.1 are now correctly unwrapped and blocked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(skills): add activation criteria limits enforcement tests

Adds test_activation_criteria_enforce_limits to verify that
enforce_limits() correctly trims excess patterns (>5), keywords (>20),
and tags (>10), and filters out short keywords/tags (<3 chars).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(wasm): add security regression tests for WASM tool loader

Add 6 tests covering: tool name path separator rejection, empty name
rejection, nonexistent file handling, invalid WASM bytes rejection,
dotfile discovery behavior, and subdirectory non-recursion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: address PR review feedback

- Remove plan files from repo (ilblackdragon review)
- Replace CLAUDE.md test tier rules with pointer to check-boundaries.sh
- Add Check 4 to check-boundaries.sh: enforces integration tests are
  gated behind the 'integration' feature flag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add try_connect silent-skip pattern check to check-boundaries.sh

Check 5 catches try_connect() and similar silent-skip patterns in
integration tests. Tests should use feature gates to fail loudly
when prerequisites are missing, not silently return.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): harden skill fetch SSRF checks

* fix(scripts): use bash arrays in check-boundaries.sh tier violation check

Refactor Check 4 in check-boundaries.sh to use bash arrays and printf
instead of string concatenation with echo -e. This is more robust with
special characters in filenames and avoids portability concerns with
echo -e. [skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…ests, search edge cases (nearai#623)

* feat(testing): add StubChannel test double for Channel trait

Adds StubChannel to src/testing.rs alongside StubLlm. Supports message
injection via mpsc sender, response/status capture, and configurable
health check toggling. Includes handle methods for use after ownership
transfer to ChannelManager.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(testing): wire StubChannel into TestHarnessBuilder

Add with_stub_channel() builder method that creates a StubChannel
pre-registered in a ChannelManager. Tests can inject messages via
the sender and verify routing through the manager. The channel field
on TestHarness is Optional, defaulting to None for backward compat.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: gate external-service tests behind integration feature flag

Replace silent try_connect() skip pattern with explicit feature gating.
cargo test now runs only self-contained tests.
cargo test --features integration runs tests requiring PostgreSQL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(channels): add ChannelManager unit tests using StubChannel

Cover add/start_all stream merging, respond routing, unknown channel
errors, health_check_all with mixed health, empty-channels error path,
and injection channel merging -- all via StubChannel test double.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: document test tier separation (unit/integration/live)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add architecture boundary check script

Grep-based checks for three architecture boundaries:
- Direct database driver usage (tokio_postgres/libsql) outside src/db/
- .unwrap()/.expect() in production code (warning only)
- Direct std::env::var reads outside config layer (warning only)

The DB driver check is a hard violation; the other two are warnings
for gradual cleanup. Run with: bash scripts/check-boundaries.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(search): add RRF edge case tests for empty inputs, limits, and config modes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(security): add regression tests for skill installer ZIP and SSRF protections

Add 11 regression tests covering the security controls in skill_tools:

ZIP extraction safety:
- Valid SKILL.md extraction works correctly
- Non-SKILL.md entries are ignored (returns error)
- Path traversal entries (../../SKILL.md) do not match
- Nested path entries (subdir/SKILL.md) do not match
- Oversized entries (>1MB uncompressed) are rejected

SSRF prevention:
- Loopback addresses (127.0.0.1) are blocked
- Private ranges (10.x, 172.16.x, 192.168.x) are blocked
- Link-local addresses (169.254.x) are blocked
- Public IPs (8.8.8.8, 1.1.1.1) are allowed
- IPv4-mapped IPv6 unwrapping logic works correctly
- Metadata endpoints and .internal/.local hostnames are blocked
- Normal hostnames (github.com, clawhub.dev) are allowed

Also documents a known gap: url::Url::host_str() returns bracketed
IPv6 addresses that std::net::IpAddr cannot parse, so IPv4-mapped
IPv6 URLs currently bypass IP-based checks in validate_fetch_url.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(testing): extract TestGatewayBuilder to eliminate gateway test duplication

Both ws_gateway_integration.rs and openai_compat_integration.rs manually
constructed GatewayState with 19+ fields. Extracted to a shared builder in
src/channels/web/test_helpers.rs that provides sensible defaults and lets
tests override only what they need.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add implementation plans for testing batches 1 and 2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): close IPv6 SSRF bypass in validate_fetch_url

validate_fetch_url used host_str() which returns bracketed IPv6
(e.g. "[::ffff:7f00:1]") that IpAddr::parse() cannot handle,
silently skipping IP-based SSRF checks for all IPv6 URLs.

Switch to url::Host enum matching to extract proper IpAddr values
without string parsing. IPv4-mapped IPv6 addresses like
::ffff:127.0.0.1 are now correctly unwrapped and blocked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(skills): add activation criteria limits enforcement tests

Adds test_activation_criteria_enforce_limits to verify that
enforce_limits() correctly trims excess patterns (>5), keywords (>20),
and tags (>10), and filters out short keywords/tags (<3 chars).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(wasm): add security regression tests for WASM tool loader

Add 6 tests covering: tool name path separator rejection, empty name
rejection, nonexistent file handling, invalid WASM bytes rejection,
dotfile discovery behavior, and subdirectory non-recursion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: address PR review feedback

- Remove plan files from repo (ilblackdragon review)
- Replace CLAUDE.md test tier rules with pointer to check-boundaries.sh
- Add Check 4 to check-boundaries.sh: enforces integration tests are
  gated behind the 'integration' feature flag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add try_connect silent-skip pattern check to check-boundaries.sh

Check 5 catches try_connect() and similar silent-skip patterns in
integration tests. Tests should use feature gates to fail loudly
when prerequisites are missing, not silently return.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): harden skill fetch SSRF checks

* fix(scripts): use bash arrays in check-boundaries.sh tier violation check

Refactor Check 4 in check-boundaries.sh to use bash arrays and printf
instead of string concatenation with echo -e. This is more robust with
special characters in filenames and avoids portability concerns with
echo -e. [skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/web Web gateway channel scope: channel Channel infrastructure scope: ci CI/CD workflows scope: dependencies Dependency updates scope: docs Documentation scope: tool/builtin Built-in tools scope: tool/wasm WASM tool sandbox scope: workspace Persistent memory / workspace size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants