Skip to content

Onboarding: show Telegram in channel selection and auto-install bundled channel#3

Merged
ilblackdragon merged 1 commit intonearai:mainfrom
serrrfirat:codex/onboarding-telegram-selection
Feb 7, 2026
Merged

Onboarding: show Telegram in channel selection and auto-install bundled channel#3
ilblackdragon merged 1 commit intonearai:mainfrom
serrrfirat:codex/onboarding-telegram-selection

Conversation

@serrrfirat
Copy link
Copy Markdown
Collaborator

Summary

  • add bundled channel installer in (currently Telegram)
  • show bundled channels directly in onboarding channel selection UI
  • auto-install selected bundled channels during onboarding before setup
  • ensure channel-only onboarding initializes DB migrations before saving channel secrets
  • add support for credential location in capabilities parsing so Telegram capabilities load correctly

Why

This aligns UX with onboarding-first configuration: users choose channels in one place, without a separate install prompt/command.

Testing

running 7 tests
test setup::wizard::tests::test_mask_password_in_url ... ok
test setup::wizard::tests::test_capitalize_first ... ok
test setup::wizard::tests::test_wasm_channel_option_names_includes_bundled_when_missing ... ok
test setup::wizard::tests::test_wasm_channel_option_names_dedupes_bundled ... ok
test setup::wizard::tests::test_wizard_with_config ... ok
test setup::wizard::tests::test_wizard_creation ... ok
test setup::wizard::tests::test_install_missing_bundled_channels_installs_telegram ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 499 filtered out; finished in 0.02s

running 1 test
test tools::wasm::capabilities_schema::tests::test_parse_url_path_credential ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 505 filtered out; finished in 0.00s

running 3 tests
test channels::wasm::bundled::tests::test_bundled_channel_names_contains_telegram ... ok
test channels::wasm::bundled::tests::test_install_bundled_channel_refuses_overwrite_without_force ... ok
test channels::wasm::bundled::tests::test_install_bundled_channel_writes_files ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 503 filtered out; finished in 0.00s

Notes

I intentionally excluded separate CLI channel-command changes from this PR scope.

@ilblackdragon ilblackdragon merged commit 6831a54 into nearai:main Feb 7, 2026
@serrrfirat serrrfirat mentioned this pull request Feb 10, 2026
@github-actions github-actions bot mentioned this pull request Feb 12, 2026
ilblackdragon added a commit that referenced this pull request Feb 19, 2026
- Use manifest.name (not crate_name) for installed filenames so
  discovery, auth, and CLI commands all agree on the stem (#1)
- Add AlreadyInstalled error variant instead of misleading
  ExtensionNotFound (#2)
- Add DownloadFailed error variant with URL context instead of
  stuffing URLs into PathBuf (#3)
- Validate HTTP status with error_for_status() before reading
  response bytes in artifact downloads (#4)
- Switch build_wasm_component to tokio::process::Command with
  status() so build output streams to the terminal (#6)
- Find WASM artifact by crate_name specifically instead of picking
  the first .wasm file in the release directory (#7)
- Add is_file() guard in catalog loader to skip directories (#8)
- Detect ambiguous bare-name lookups when both tools/<name> and
  channels/<name> exist, with get_strict() returning an error (#9)
- Fix wizard step_extensions to check tool.name for installed
  detection, consistent with the new naming (#11, #12)
- Fix redundant closures and map_or clippy warnings in changed files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Feb 20, 2026
- Use manifest.name (not crate_name) for installed filenames so
  discovery, auth, and CLI commands all agree on the stem (#1)
- Add AlreadyInstalled error variant instead of misleading
  ExtensionNotFound (#2)
- Add DownloadFailed error variant with URL context instead of
  stuffing URLs into PathBuf (#3)
- Validate HTTP status with error_for_status() before reading
  response bytes in artifact downloads (#4)
- Switch build_wasm_component to tokio::process::Command with
  status() so build output streams to the terminal (#6)
- Find WASM artifact by crate_name specifically instead of picking
  the first .wasm file in the release directory (#7)
- Add is_file() guard in catalog loader to skip directories (#8)
- Detect ambiguous bare-name lookups when both tools/<name> and
  channels/<name> exist, with get_strict() returning an error (#9)
- Fix wizard step_extensions to check tool.name for installed
  detection, consistent with the new naming (#11, #12)
- Fix redundant closures and map_or clippy warnings in changed files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Feb 20, 2026
…tion (#238)

* feat: add extension registry with metadata catalog, CLI, and onboarding integration

Adds a central registry that catalogs all 14 available extensions (10 tools,
4 channels) with their capabilities, auth requirements, and artifact references.
The onboarding wizard now shows installable channels from the registry and
offers tool installation as a new Step 7.

- registry/ folder with per-extension JSON manifests and bundle definitions
- src/registry/ module: manifest structs, catalog loader, installer
- `ironclaw registry list|info|install|install-defaults` CLI commands
- Setup wizard enhanced: channels from registry, new extensions step (8 steps)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): resolve workspace errors for tool crates and channels-only onboarding

Tool crates in tools-src/ and channels-src/ failed `cargo metadata` during
onboard install because Cargo resolved them as part of the root workspace.
Add `[workspace]` table to each standalone crate and extend the root
`workspace.exclude` list so they build independently.

Channels-only mode (`onboard --channels-only`) failed with "Secrets not
configured" and "No database connection" because it skipped database and
security setup. Add `reconnect_existing_db()` to establish the DB connection
and load saved settings before running channel configuration.

Also improve the tunnel "already configured" display to show full provider
details (domain, mode, command) instead of just the provider name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(registry): address PR review feedback on installer and catalog

- Use manifest.name (not crate_name) for installed filenames so
  discovery, auth, and CLI commands all agree on the stem (#1)
- Add AlreadyInstalled error variant instead of misleading
  ExtensionNotFound (#2)
- Add DownloadFailed error variant with URL context instead of
  stuffing URLs into PathBuf (#3)
- Validate HTTP status with error_for_status() before reading
  response bytes in artifact downloads (#4)
- Switch build_wasm_component to tokio::process::Command with
  status() so build output streams to the terminal (#6)
- Find WASM artifact by crate_name specifically instead of picking
  the first .wasm file in the release directory (#7)
- Add is_file() guard in catalog loader to skip directories (#8)
- Detect ambiguous bare-name lookups when both tools/<name> and
  channels/<name> exist, with get_strict() returning an error (#9)
- Fix wizard step_extensions to check tool.name for installed
  detection, consistent with the new naming (#11, #12)
- Fix redundant closures and map_or clippy warnings in changed files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): restore DB connection fields after settings reload

reconnect_postgres() and reconnect_libsql() called Settings::from_db_map()
which overwrote database_url / libsql_path / libsql_url set from env vars.
Also use get_strict() in cmd_info to surface ambiguous bare-name errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix clippy collapsible_if and print_literal warnings

Collapse nested if-let chains and inline string literals in format
macros to satisfy CI clippy lint checks (deny warnings).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(registry): prefer artifacts for install-defaults and improve dir lookup

- InstallDefaults now defaults to downloading pre-built artifacts
  (matching `registry install` behavior), with --build flag for source builds.
- find_registry_dir() walks up 3 ancestor levels from the exe and adds
  a CARGO_MANIFEST_DIR fallback, matching load_registry_catalog() logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jaswinder6991 pushed a commit to jaswinder6991/ironclaw that referenced this pull request Feb 26, 2026
…tion (nearai#238)

* feat: add extension registry with metadata catalog, CLI, and onboarding integration

Adds a central registry that catalogs all 14 available extensions (10 tools,
4 channels) with their capabilities, auth requirements, and artifact references.
The onboarding wizard now shows installable channels from the registry and
offers tool installation as a new Step 7.

- registry/ folder with per-extension JSON manifests and bundle definitions
- src/registry/ module: manifest structs, catalog loader, installer
- `ironclaw registry list|info|install|install-defaults` CLI commands
- Setup wizard enhanced: channels from registry, new extensions step (8 steps)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): resolve workspace errors for tool crates and channels-only onboarding

Tool crates in tools-src/ and channels-src/ failed `cargo metadata` during
onboard install because Cargo resolved them as part of the root workspace.
Add `[workspace]` table to each standalone crate and extend the root
`workspace.exclude` list so they build independently.

Channels-only mode (`onboard --channels-only`) failed with "Secrets not
configured" and "No database connection" because it skipped database and
security setup. Add `reconnect_existing_db()` to establish the DB connection
and load saved settings before running channel configuration.

Also improve the tunnel "already configured" display to show full provider
details (domain, mode, command) instead of just the provider name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(registry): address PR review feedback on installer and catalog

- Use manifest.name (not crate_name) for installed filenames so
  discovery, auth, and CLI commands all agree on the stem (nearai#1)
- Add AlreadyInstalled error variant instead of misleading
  ExtensionNotFound (nearai#2)
- Add DownloadFailed error variant with URL context instead of
  stuffing URLs into PathBuf (nearai#3)
- Validate HTTP status with error_for_status() before reading
  response bytes in artifact downloads (nearai#4)
- Switch build_wasm_component to tokio::process::Command with
  status() so build output streams to the terminal (nearai#6)
- Find WASM artifact by crate_name specifically instead of picking
  the first .wasm file in the release directory (nearai#7)
- Add is_file() guard in catalog loader to skip directories (nearai#8)
- Detect ambiguous bare-name lookups when both tools/<name> and
  channels/<name> exist, with get_strict() returning an error (nearai#9)
- Fix wizard step_extensions to check tool.name for installed
  detection, consistent with the new naming (nearai#11, nearai#12)
- Fix redundant closures and map_or clippy warnings in changed files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): restore DB connection fields after settings reload

reconnect_postgres() and reconnect_libsql() called Settings::from_db_map()
which overwrote database_url / libsql_path / libsql_url set from env vars.
Also use get_strict() in cmd_info to surface ambiguous bare-name errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix clippy collapsible_if and print_literal warnings

Collapse nested if-let chains and inline string literals in format
macros to satisfy CI clippy lint checks (deny warnings).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(registry): prefer artifacts for install-defaults and improve dir lookup

- InstallDefaults now defaults to downloading pre-built artifacts
  (matching `registry install` behavior), with --build flag for source builds.
- find_registry_dir() walks up 3 ancestor levels from the exe and adds
  a CARGO_MANIFEST_DIR fallback, matching load_registry_catalog() logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 5, 2026
- Increase wait_for_responses polling to exponential backoff (50ms-500ms)
  and raise default timeout from 15s to 30s to reduce CI flakiness (#1)
- Strengthen prompt_injection_resilience test with positive safety layer
  assertion via has_safety_warnings(), enable injection_check (#2)
- Add assert_tool_order() helper and tools_order field in TraceExpects
  for verifying tool execution ordering in multi-step traces (#3)
- Document TraceLlm sequential-call assumption for concurrency (#6)
- Clean up CleanupGuard with PathKind enum instead of shotgun
  remove_file + remove_dir_all on every path (#8)
- Fix coverage.sh: default to --lib only, fix multi-filter syntax,
  add COV_ALL_TARGETS option
- Add coverage/ to .gitignore
- Remove planning docs from PR

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 5, 2026
* refactor: extract shared assertion helpers to support/assertions.rs

Move 5 assertion helpers from e2e_spot_checks.rs to a shared module.
Add assert_all_tools_succeeded and assert_tool_succeeded for eliminating
false positives in E2E tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add tool output capture via tool_results() accessor

Extract (name, preview) from ToolResult status events in TestChannel
and TestRig, enabling content assertions on tool outputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct tool parameters in 3 broken trace fixtures

- tool_time.json: add missing "operation": "now" for time tool
- robust_correct_tool.json: same fix
- memory_full_cycle.json: change "path" to "target" for memory_write

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add tool success and output assertions to eliminate false positives

Every E2E test that exercises tools now calls assert_all_tools_succeeded.
Added tool output content assertions where tool results are predictable
(time year, read_file content, memory_read content).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: capture per-tool timing from ToolStarted/ToolCompleted events

Record Instant on ToolStarted and compute elapsed duration on
ToolCompleted, wiring real timing data into collect_metrics() instead
of hardcoded zeros.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: add RAII CleanupGuard for temp file/dir cleanup in tests

Replace manual cleanup_test_dir() calls and inline remove_file() with
Drop-based CleanupGuard that ensures cleanup even if a test panics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Drop impl and graceful shutdown for TestRig

Wrap agent_handle in Option so Drop can abort leaked tasks. Signal
the channel shutdown before aborting for future cooperative shutdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace agent startup sleep with oneshot ready signal

Use a oneshot channel fired in Channel::start() instead of a fixed
100ms sleep, eliminating the race condition on slow systems.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace fragile string-matching iteration limit with count-based detection

Use tool completion count vs max_tool_iterations instead of scanning
status messages for "iteration"/"limit" substrings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use assert_all_tools_succeeded for memory_full_cycle test

Remove incorrect comment about memory_tree failing with empty path
(it actually succeeds). Omit empty path from fixture and use the
standard assert_all_tools_succeeded instead of per-tool assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: promote benchmark metrics types to library code

Move TraceMetrics, ScenarioResult, RunResult, MetricDelta, and
compare_runs() from tests/support/metrics.rs to src/benchmark/metrics.rs.
Existing tests use re-export for backward compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add Scenario and Criterion types for agent benchmarking

Scenario defines a task with input, success criteria, and resource
limits. Criterion is an enum of programmatic checks (tool_used,
response_contains, etc.) evaluated without LLM judgment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add initial benchmark scenario suite (12 scenarios across 5 categories)

Scenarios cover tool_selection, tool_chaining, error_recovery,
efficiency, and memory_operations. All loaded from JSON with
deserialization validation test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add benchmark runner with BenchChannel and InstrumentedLlm

BenchChannel is a minimal Channel implementation for benchmarks.
InstrumentedLlm wraps any LlmProvider to capture per-call metrics.
Runner creates a fresh agent per scenario, evaluates success criteria,
and produces RunResult with timing, token, and cost metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add baseline management, reports, and benchmark entry point

- baseline.rs: load/save/promote benchmark results
- report.rs: format comparison reports with regression detection
- benchmark_runner.rs: integration test with real LLM (feature-gated)
- Add benchmark feature flag to Cargo.toml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply cargo fmt to benchmark module

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add multi-turn scenario types with setup, judge, ResponseNotContains

Add BenchScenario, Turn, TurnAssertions, JudgeConfig, ScenarioSetup,
WorkspaceSetup, SeedDocument types for multi-turn benchmark scenarios.
Add ResponseNotContains criterion variant. Add TurnAssertions::to_criteria()
converter for backward compat with existing evaluation engine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add JSON scenario loader with recursive discovery and tag filter

Add load_bench_scenarios() for the new BenchScenario format with recursive
directory traversal and tag-based filtering. Create 4 initial trajectory
scenarios across tool-selection, multi-turn, and efficiency categories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): multi-turn runner with workspace seeding and per-turn metrics

Add run_bench_scenario() that loops over BenchScenario turns, seeds workspace
documents, collects per-turn metrics (tokens, tool calls, wall time), and
evaluates per-turn assertions. Add TurnMetrics to metrics.rs and
clear_for_next_turn() to BenchChannel.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add LLM-as-judge scoring with prompt formatting and score parsing

Create judge.rs with format_judge_prompt, parse_judge_score, and judge_turn.
Wire into run_bench_scenario for turns with judge config -- scores below
min_score fail the turn.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add CLI subcommand (ironclaw benchmark)

Add BenchmarkCommand with --tags, --scenario, --no-judge, --timeout,
--update-baseline flags. Wire into Command enum and main.rs dispatch.
Feature-gated behind benchmark flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): per-scenario JSON output with full trajectory

Add save_scenario_results() that writes per-scenario JSON files alongside
the run summary. Each scenario gets its own file with turn_metrics trajectory.
Update CLI to use new output format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add ToolRegistry::retain_only and wire tool filtering in scenarios

Add a retain_only() method to ToolRegistry that filters tools down to a
given allowlist. Wire this into run_bench_scenario() so that when a
scenario specifies a tools list in its setup, only those tools are
available during the benchmark run. Includes two tests for the new
method: one verifying filtering works and one verifying empty input
is a no-op.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): wire identity overrides into workspace before agent start

Add seed_identity() helper that writes identity files (IDENTITY.md,
USER.md, etc.) into the workspace before the agent starts, so that
workspace.system_prompt() picks them up. Wire it into
run_bench_scenario() after workspace seeding. Include a test that
verifies identity files are written and readable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add --parallel and --max-cost CLI flags

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(benchmark): use feature-conditional snapshot names for CLI help tests

Prevents snapshot conflicts between default (no benchmark) and
all-features (with benchmark) builds by using separate snapshot names
per feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): parallel execution with JoinSet and budget cap enforcement

Replace sequential loop in run_all_bench() with parallel execution using
JoinSet + semaphore when config.parallel > 1. Add budget cap enforcement
that skips remaining scenarios when max_total_cost_usd is exceeded.
Track skipped count in RunResult.skipped_scenarios and display it in
format_report().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add tool restriction and identity override test scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: fix formatting for Phase 3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add SkillRegistry::retain_only and wire skill filtering in scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add --json flag for machine-readable output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add GitHub Actions benchmark workflow (manual trigger)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(benchmark): remove in-tree benchmark harness, keep retain_only utilities

Move benchmark-specific code out of ironclaw in preparation for the
nearai/benchmarks trajectory adapter. This removes:

- src/benchmark/ (runner, scenarios, metrics, judge, report, etc.)
- src/cli/benchmark.rs and the Benchmark CLI subcommand
- benchmarks/ data directory (scenarios + trajectories)
- .github/workflows/benchmark.yml
- The "benchmark" Cargo feature flag

What remains:
- ToolRegistry::retain_only() and SkillRegistry::retain_only()
- Test support types (TraceMetrics, InstrumentedLlm) inlined into
  tests/support/ instead of re-exporting from the deleted module

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add README for LLM trace fixture format

Documents the trajectory JSON format, response types, request hints,
directory structure, and how to write new traces.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(test): unify trace format around turns, add multi-turn support

Introduce TraceTurn type that groups user_input with LLM response steps,
making traces self-contained conversation trajectories. Add run_trace()
to TestRig for automatic multi-turn replay. Backward-compatible: flat
"steps" JSON is deserialized as a single turn transparently.

Includes all trace fixtures (spot, coverage, advanced), plan docs, and
new e2e tests for steering, error recovery, long chains, memory, and
prompt injection resilience.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): fix CI failures after merging main

- Fix tool_json fixture: use "data" parameter (not "input") to match
  JsonTool schema
- Fix status_events test: remove assertion for "time" tool that isn't
  in the fixture (only "echo" calls are used)
- Allow dead_code in test support metrics/instrumented_llm modules
  (utilities for future benchmark tests)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Working on recording traces and testing them

* feat(test): add declarative expects to trace fixtures, split infra tests

Add TraceExpects struct with 9 optional assertion fields (response_contains,
tools_used, all_tools_succeeded, etc.) that can be declared in fixture JSON
instead of hand-written Rust. Add verify_expects() and run_recorded_trace()
so recorded trace tests become one-liners.

Split trace infra tests (deserialization, backward compat) into
tests/trace_format.rs which doesn't require the libsql feature gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(test): add expects to all trace fixtures, simplify e2e tests

Add declarative expects blocks to all 19 trace fixture JSONs across
spot/, coverage/, advanced/, and root directories. Update all 8 e2e
test files to use verify_trace_expects() / run_and_verify_trace(),
replacing ~270 lines of hand-written assertions with fixture-driven
verification.

Tests that check things beyond expects (file content on disk, metrics,
event ordering) keep those extra assertions alongside the declarative
ones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): adapt tests to AppBuilder refactor, fix formatting

Update test files to work with refactored TestRigBuilder that uses
AppBuilder::build_all() (removing with_tools/with_workspace methods).
Update telegram_check fixture to use tool_list instead of echo.
Fix cargo fmt issues in src/llm/mod.rs and src/llm/recording.rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(test): deduplicate support unit tests into single binary

Support modules (assertions, cleanup, test_channel, test_rig, trace_llm)
had #[cfg(test)] mod tests blocks that were compiled and run 12 times —
once per e2e test binary that declares `mod support;`. Extracted all 29
support unit tests into a dedicated `tests/support_unit_tests.rs` so they
run exactly once.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix trailing newlines in support files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(test): unify trace types and fix recorded multi-turn replay

Import shared types (TraceStep, TraceResponse, TraceToolCall, RequestHint,
ExpectedToolResult, MemorySnapshotEntry, HttpExchange*) from
ironclaw::llm::recording instead of redefining them in trace_llm.rs.

Fix the flat-steps deserializer to split at UserInput boundaries into
multiple turns, instead of filtering them out and wrapping everything
into a single turn. This enables recorded multi-turn traces to be
replayed as proper multi-turn conversations via run_trace().

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): fix CI failures - unused imports and missing struct fields

- Add #[allow(unused_imports)] on pub use re-exports in trace_llm.rs
  (types are re-exported for downstream test files, not used locally)
- Add `..` to ToolCompleted pattern in test_channel.rs to match new
  `error` and `parameters` fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): fix CI failures after merging main

- Add missing `error` and `parameters` fields to ToolCompleted
  constructors in support_unit_tests.rs
- Add `..` to ToolCompleted pattern match in support_unit_tests.rs
- Add #[allow(dead_code)] to CleanupGuard, LlmTrace impl, and
  TraceLlm impl (only used behind #[cfg(feature = "libsql")])

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Adding coverage running script

* fix(test): address review feedback on E2E test infrastructure

- Increase wait_for_responses polling to exponential backoff (50ms-500ms)
  and raise default timeout from 15s to 30s to reduce CI flakiness (#1)
- Strengthen prompt_injection_resilience test with positive safety layer
  assertion via has_safety_warnings(), enable injection_check (#2)
- Add assert_tool_order() helper and tools_order field in TraceExpects
  for verifying tool execution ordering in multi-step traces (#3)
- Document TraceLlm sequential-call assumption for concurrency (#6)
- Clean up CleanupGuard with PathKind enum instead of shotgun
  remove_file + remove_dir_all on every path (#8)
- Fix coverage.sh: default to --lib only, fix multi-filter syntax,
  add COV_ALL_TARGETS option
- Add coverage/ to .gitignore
- Remove planning docs from PR

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review - use HashSet in retain_only, improve skill test

- Use HashSet for O(N+M) lookup in SkillRegistry::retain_only and
  ToolRegistry::retain_only instead of linear scan
- Strengthen test_retain_only_empty_is_noop in SkillRegistry to
  pre-populate with a skill before asserting the no-op behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): revert incorrect safety layer assertion in injection test

The safety layer sanitizes tool output, not user input. The injection
test sends a malicious user message with no tools called, so the safety
layer never fires. Reverted to the original test which correctly
validates the LLM refuses via trace expects. Also fixed case-sensitive
request hint ("ignore" -> "Ignore") to suppress noisy warning.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: clean stale profdata before coverage run

Adds `cargo llvm-cov clean` before each run to prevent
"mismatched data" warnings from stale instrumentation profiles.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix formatting in retain_only test

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
ilblackdragon added a commit that referenced this pull request Mar 6, 2026
- Switch build script from python3 to jq for JSON parsing, consistent
  with release.yml and avoids python3 dependency (#1, #7)
- Use dirs::home_dir() instead of HOME env var for portability (#2)
- Filter extensions by manifest "kind" field instead of path (#3)
- Replace .flatten() with explicit error handling in dir iteration (#4, #5)
- Split stub_tool_host_functions into stub_shared_host_functions +
  tool-only tool-invoke stub, since tool-invoke is not in channel WIT (#6)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 6, 2026
* test: add WIT compatibility tests for all WASM tools and channels

Adds CI and integration tests to catch WIT interface breakage across
all 14 WASM extensions (10 tools + 4 channels). Previously, changing
wit/tool.wit or wit/channel.wit could silently break guest-side tools
that weren't rebuilt until release time.

Three new pieces:

1. scripts/build-wasm-extensions.sh — builds all WASM extensions from
   source by reading registry manifests. Used by CI and locally.

2. tests/wit_compat.rs — integration tests that compile and instantiate
   each .wasm binary against the current wasmtime host linker with
   stubbed host functions. Catches added/removed/renamed WIT functions,
   signature mismatches, and missing exports. Skips gracefully when
   artifacts aren't built so `cargo test` still passes standalone.

3. .github/workflows/test.yml — new wasm-wit-compat CI job that builds
   all extensions then runs instantiation tests on every PR. Added to
   the branch protection roll-up.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix rustfmt formatting in wit_compat tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback on WIT compat tests

- Switch build script from python3 to jq for JSON parsing, consistent
  with release.yml and avoids python3 dependency (#1, #7)
- Use dirs::home_dir() instead of HOME env var for portability (#2)
- Filter extensions by manifest "kind" field instead of path (#3)
- Replace .flatten() with explicit error handling in dir iteration (#4, #5)
- Split stub_tool_host_functions into stub_shared_host_functions +
  tool-only tool-invoke stub, since tool-invoke is not in channel WIT (#6)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 7, 2026
…lity

Security fixes:
- Remove SSRF-prone download() from DocumentExtractionMiddleware (#13)
- Sanitize filenames in workspace path to prevent directory traversal (#11)
- Pre-check file size before reading in WASM wrapper to prevent OOM (#2)
- Percent-encode file_id in Telegram source URLs (#7)

Correctness fixes:
- Clear image_content_parts on turn end to prevent memory leak (#1)
- Find first *successful* transcription instead of first overall (#3)
- Enforce data.len() size limit in document extraction (#10)
- Use UTF-8 safe truncation with char_indices() (#12)

Robustness & code quality:
- Add 120s timeout to OpenAI Whisper HTTP client (#5)
- Trim trailing slash from Whisper base_url (#6)
- Allow ~/.ironclaw/ paths in WASM wrapper (#8)
- Return error from on_broadcast in Slack/Discord/WhatsApp (#9)
- Fix doc comment in HTTP tool (#4)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 7, 2026
* feat: add inbound attachment support to WASM channel system

Add attachment record to WIT interface and implement inbound media
parsing across all four channel implementations (Telegram, Slack,
WhatsApp, Discord). Attachments flow from WASM channels through
EmittedMessage to IncomingMessage with validation (size limits,
MIME allowlist, count caps) at the host boundary.

- Add `attachment` record to `emitted-message` in wit/channel.wit
- Add `IncomingAttachment` struct to channel.rs and re-export
- Add host-side validation (20MB total, 10 max, MIME allowlist)
- Telegram: parse photo, document, audio, video, voice, sticker
- Slack: parse file attachments with url_private
- WhatsApp: parse image, audio, video, document with captions
- Discord: backward-compatible empty attachments
- Update FEATURE_PARITY.md section 7
- Add fixture-based tests per channel and host integration tests

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: integrate outbound attachment support and reconcile WIT types (#409)

Reconcile PR #409's outbound attachment work with our inbound attachment
support into a unified design:

WIT type split:
- `inbound-attachment` in channel-host: metadata-only (id, mime_type,
  filename, size_bytes, source_url, storage_key, extracted_text)
- `attachment` in channel: raw bytes (filename, mime_type, data) on
  agent-response for outbound sending

Outbound features (from PR #409):
- `on-broadcast` WIT export for proactive messages without prior inbound
- Telegram: multipart sendPhoto/sendDocument with auto photo→document
  fallback for files >10MB
- wrapper.rs: `call_on_broadcast`, `read_attachments` from disk,
  attachment params threaded through `call_on_respond`
- HTTP tool: `save_to` param for binary downloads to /tmp/ (50MB limit,
  path traversal protection, SSRF-safe redirect following)
- Message tool: allow /tmp/ paths for attachments alongside base_dir
- Credential env var fallback in inject_channel_credentials

Channel updates:
- All 4 channels implement on_broadcast (Telegram full, others stub)
- Telegram: polling_enabled config, adjusted poll timeout
- Inbound attachment types renamed to InboundAttachment in all channels

Tests: 1965 passing (9 new), 0 clippy warnings

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add audio transcription pipeline and extensible WIT attachment design

Add host-side transcription middleware (OpenAI Whisper) that detects audio
attachments with inline data on incoming messages and transcribes them
automatically. Refactor WIT inbound-attachment to use extras-json and a
store-attachment-data host function instead of typed fields, so future
attachment properties (dimensions, codec, etc.) don't require WIT changes
that invalidate all channel plugins.

- Add src/transcription/ module: TranscriptionProvider trait,
  TranscriptionMiddleware, AudioFormat enum, OpenAI Whisper provider
- Add src/config/transcription.rs: TRANSCRIPTION_ENABLED/MODEL/BASE_URL
- Wire middleware into agent message loop via AgentDeps
- WIT: replace data + duration-secs with extras-json + store-attachment-data
- Host: parse extras-json for well-known keys, merge stored binary data
- Telegram: download voice files via store-attachment-data, add duration
  to extras-json, add /file/bot to HTTP allowlist, voice-only placeholder
- Add reqwest multipart feature for Whisper API uploads
- 5 regression tests for transcription middleware

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire attachment processing into LLM pipeline with multimodal image support

Attachments on incoming messages are now augmented into user text via XML tags
before entering the turn system, and images with data are passed as multimodal
content parts (base64 data URIs) to LLM providers. This enables audio transcripts,
document text, and image content to reach the LLM without changes to ChatMessage
serialization or provider interfaces.

- Add src/agent/attachments.rs with augment_with_attachments() and 9 unit tests
- Add ContentPart/ImageUrl types to llm::provider with OpenAI-compatible serde
- Carry image_content_parts transiently on Turn (skipped in serialization)
- Update nearai_chat and rig_adapter to serialize multimodal content
- Add 3 e2e tests verifying attachments flow through the full agent loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: CI failures — formatting, version bumps, and Telegram voice test

- Fix cargo fmt formatting in attachments.rs, nearai_chat.rs, rig_adapter.rs,
  e2e_attachments.rs
- Bump channel registry versions 0.1.0 → 0.2.0 (discord, slack, telegram,
  whatsapp) to satisfy version-bump CI check
- Fix Telegram test_extract_attachments_voice: add missing required `duration`
  field to voice fixture JSON

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: bump WIT channel version to 0.3.0, fix Telegram voice test, add pre-commit hook

- Bump wit/channel.wit package version 0.2.0 → 0.3.0 (interface changed with
  store-attachment-data)
- Update WIT_CHANNEL_VERSION constant and registry wit_version fields to match
- Fix Telegram test_extract_attachments_voice: gate voice download behind
  #[cfg(target_arch = "wasm32")] so host functions aren't called in native tests,
  update assertions for generated filename and extras_json duration
- Add @0.3.0 linker stubs in wit_compat.rs
- Add .githooks/pre-commit hook that runs scripts/check-version-bumps.sh when
  WIT or extension sources are staged
- Symlink commit-msg regression hook into .githooks/

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract voice download from extract_attachments into handle_message

Move download_voice_file + store_attachment_data calls out of
extract_attachments into a separate download_and_store_voice function
called from handle_message. This keeps extract_attachments as a pure
data-mapping function with no host calls, making it fully testable
in native unit tests without #[cfg(target_arch)] gates.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review comments — security, correctness, and code quality

Security fixes:
- Add path validation to read_attachments (restrict to /tmp/) preventing
  arbitrary file reads from compromised tools
- Escape XML special characters in attachment filenames, MIME types, and
  extracted text to prevent prompt injection via tag spoofing
- Percent-encode file_id in Telegram getFile URL to prevent query injection
- Clone SecretString directly instead of expose_secret().to_string()

Correctness fixes:
- Fix store_attachment_data overwrite accounting: subtract old entry size
  before adding new to prevent inflated totals and false rejections
- Use max(reported, stored_size) for attachment size accounting to prevent
  WASM channels from under-reporting size_bytes to bypass limits
- Add application/octet-stream to MIME allowlist (channels default unknown
  types to this)

Code quality:
- Extract send_response helper in Telegram, deduplicating on_respond and
  on_broadcast
- Rename misleading Discord test to test_parse_slash_command_interaction
- Fix .githooks/commit-msg to use relative symlink (portable across machines)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add tool_upgrade command + fix TOCTOU in save_to path validation

Add `tool_upgrade` — a new extension management tool that automatically
detects and reinstalls WASM extensions with outdated WIT versions.
Preserves authentication secrets during upgrade. Supports upgrading a
single extension by name or all installed WASM tools/channels at once.

Fix TOCTOU in `validate_save_to_path`: validate the path *before*
creating parent directories, so traversal paths like `/tmp/../../etc/`
cannot cause filesystem mutations outside /tmp before being rejected.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify WIT package version to 0.3.0 across tool.wit and all capabilities

tool.wit and channel.wit share the `near:agent` package namespace, so they
must declare the same version. Bumps tool.wit from 0.2.0 to 0.3.0 and
updates all capabilities files and registry entries to match.

Fixes `cargo component build` failure: "package identifier near:agent@0.2.0
does not match previous package name of near:agent@0.3.0"

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: move WIT file comments after package declaration

WIT treats `//` comments before `package` as doc comments. When both
tool.wit and channel.wit had header comments, the parser rejected them
as "doc comments on multiple 'package' items". Move comments after the
package declaration in both files.

Also bumps tool registry versions to 0.2.0 to match the WIT 0.3.0 bump.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: display extension versions in gateway Extensions tab

Add version field to InstalledExtension and RegistryEntry types, pipe
through the web API (ExtensionInfo, RegistryEntryInfo), and render as
a badge in the gateway UI for both installed and available extensions.

For installed WASM extensions, version is read from the capabilities
file with a fallback to the registry entry when the local file has no
version (old installations). Bump all extension Cargo.toml and registry
JSON versions from 0.1.0 to 0.2.0 to keep them in sync.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add document text extraction middleware for PDF, Office, and text files

Extract text from document attachments (PDF, DOCX, PPTX, XLSX, RTF, plain text,
code files) so the LLM can reason about uploaded documents. Uses pdf-extract for
PDFs, zip+XML parsing for Office XML formats, and UTF-8 decode for text files.
Wired into the agent loop after transcription middleware.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: download document files in Telegram channel for text extraction

The DocumentExtractionMiddleware needs file bytes in the attachment `data`
field, but only voice files were being downloaded. Document attachments
(PDFs, DOCX, etc.) had empty `data` and a source_url with a credential
placeholder that only works inside the WASM host's http_request.

Add `download_and_store_documents()` that downloads non-voice, non-image,
non-audio attachments via the existing two-step getFile→download flow and
stores bytes via `store_attachment_data` for host-side extraction.

Also rename `download_voice_file` → `download_telegram_file` since it's
generic for any file_id.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: allow Office MIME types and increase file download limit for Telegram

Two issues preventing document extraction from Telegram:

1. PPTX/DOCX/XLSX MIME types (application/vnd.*) were dropped by the
   WASM host attachment allowlist — add application/vnd., application/msword,
   and application/rtf prefixes.

2. Telegram file downloads over 10 MB failed with "Response body too large" —
   set max_response_bytes to 20 MB in Telegram capabilities.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: report document extraction errors back to user instead of silently skipping

- Bump max_response_bytes to 50 MB for Telegram file downloads
- When document extraction fails (too large, download error, parse error),
  set extracted_text to a user-friendly error message instead of leaving it
  None. This ensures the LLM tells the user what went wrong.
- On Telegram download failure, set extracted_text with the error so the
  user sees feedback even when the file never reaches the extraction middleware.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: store extracted document text in workspace memory for search/recall

After document extraction succeeds, write the extracted text to workspace
memory at `documents/{date}/{filename}`. This enables:
- Full-text and semantic search over past uploaded documents
- Cross-conversation recall ("what did that PDF say?")
- Automatic chunking and embedding via the workspace pipeline

Documents are stored with metadata header (uploader, channel, date, MIME type).
Error messages (extraction failures) are not stored — only successful extractions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: CI failures — formatting, unused assignment warning

- Run cargo fmt on document_extraction and agent_loop modules
- Suppress unused_assignments warning on trace_llm_ref (used only
  behind #[cfg(feature = "libsql")])

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review comments — security, correctness, and code quality

Security fixes:
- Remove SSRF-prone download() from DocumentExtractionMiddleware (#13)
- Sanitize filenames in workspace path to prevent directory traversal (#11)
- Pre-check file size before reading in WASM wrapper to prevent OOM (#2)
- Percent-encode file_id in Telegram source URLs (#7)

Correctness fixes:
- Clear image_content_parts on turn end to prevent memory leak (#1)
- Find first *successful* transcription instead of first overall (#3)
- Enforce data.len() size limit in document extraction (#10)
- Use UTF-8 safe truncation with char_indices() (#12)

Robustness & code quality:
- Add 120s timeout to OpenAI Whisper HTTP client (#5)
- Trim trailing slash from Whisper base_url (#6)
- Allow ~/.ironclaw/ paths in WASM wrapper (#8)
- Return error from on_broadcast in Slack/Discord/WhatsApp (#9)
- Fix doc comment in HTTP tool (#4)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: formatting — cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address latest PR review — doc comments, error messages, version bumps

- Fix DocumentExtractionMiddleware doc comment (no longer downloads from source_url)
- Fix error message: "no inline data" instead of "no download URL"
- Log error + fallback instead of silent unwrap_or_default on Whisper HTTP client
- Bump all capabilities.json versions from 0.1.0 to 0.2.0 to match Cargo.toml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unsupported profile: minimal from CI workflows [skip-regression-check]

dtolnay/rust-toolchain@stable does not accept the 'profile' input
(it was a parameter for the deprecated actions-rs/toolchain action).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge with latest main — resolve compilation errors and PR review nits

- Add version: None to RegistryEntry/InstalledExtension test constructors
- Fix MessageContent type mismatches in nearai_chat tests (String → MessageContent::Text)
- Fix .contains() calls on MessageContent — use .as_text().unwrap()
- Remove redundant trace_llm_ref = None assignment in test_rig
- Check data size before clone in document extraction to avoid unnecessary allocation

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 9, 2026
- Slack: add post-download size check on actual bytes when metadata
  size_bytes is absent, preventing bypass of the 20MB limit
- Telegram: add 20MB download size limit (matching Slack) enforced
  in download_telegram_file() after receiving response bytes
- Dispatcher: skip broadcasting ImageGenerated SSE event when
  data_url is empty from unwrap_or_default(), log warning instead

Closes correctness issues #3, #4, #5 from PR #725 review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 9, 2026
* feat: full image support across all channels

End-to-end image handling: upload, generation, analysis, editing, and
rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and
REPL channels. Builds on the attachment infrastructure from #596 and
draws inspiration from PR #641's image pipeline approach — credit to
that PR's author for the sentinel JSON pattern and base64-in-JSON
upload design.

Key changes:
- Image upload in web UI (file picker, paste, preview strip)
- Image generation tool (FLUX/DALL-E via /v1/images/generations)
- Image edit tool (multipart /v1/images/edits with fallback)
- Image analysis tool (vision model for workspace images)
- Model detection utilities (image_models.rs, vision_models.rs)
- Sentinel JSON detection in dispatcher for generated image rendering
- StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast
- HTTP webhook attachment support (base64, 5MB/file, 10MB total)
- WASM channel image download (Telegram via file API, Slack via host HTTP)
- Tool registration wiring in app.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR #725 review comments (16 issues)

- SecretString for API keys in all image tools (image_gen, image_edit, image_analyze)
- Binary image read via tokio::fs::read instead of DB-backed workspace.read()
- Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API)
- ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools
- Scope sentinel detection to image_generate/image_edit tool names only
- Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE)
- Extract shared media_type_from_path() to builtin/mod.rs
- Rename fallback_chat_edit → fallback_generate with tracing::warn
- Increase gateway body limit from 1MB to 10MB for image uploads
- Increase webhook body limit to 15MB (base64 overhead)
- Log warning on invalid base64 in images_to_attachments
- Client-side image size limits (5MB/file, 5 images max) in app.js
- aria-label on attach button for accessibility
- Update body_too_large test for new 10MB limit

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Slack file size check before download (PR review item #15)

Skip downloading files larger than 20 MB in the Slack WASM channel to
avoid excessive memory use and slow downloads in the WASM runtime.
Logs a warning when a file is skipped. Also bumps channel versions
for Slack and Telegram (prior branch changes).

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): add path validation and approval requirement to image tools

Add sandbox path validation via validate_path() to both ImageAnalyzeTool
and ImageEditTool to prevent path traversal attacks that could exfiltrate
arbitrary files through external vision/edit APIs. Also fix
ImageAnalyzeTool::requires_approval to return UnlessAutoApproved,
consistent with ImageEditTool and ImageGenerateTool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: post-download size guards and empty data_url sentinel check

- Slack: add post-download size check on actual bytes when metadata
  size_bytes is absent, preventing bypass of the 20MB limit
- Telegram: add 20MB download size limit (matching Slack) enforced
  in download_telegram_file() after receiving response bytes
- Dispatcher: skip broadcasting ImageGenerated SSE event when
  data_url is empty from unwrap_or_default(), log warning instead

Closes correctness issues #3, #4, #5 from PR #725 review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use mime_guess for media type detection, add alt attrs and media_type validation

- Replace hardcoded media type mapping with mime_guess crate (already in deps)
- Add alt attributes to img elements in web UI for accessibility
- Validate media_type starts with "image/" in images_to_attachments()
- Update bmp test assertion to match mime_guess behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zaki <zaki@iqlusion.io>
jackeyunjie pushed a commit to jackeyunjie/ironclaw that referenced this pull request Mar 9, 2026
- Add applications page with progress tracking
- Add enterprise page with user profile
- Add WeChat login integration
- Add application filtering by status
- Add statistics display
- Add menu for settings, help, and contact

Tasks: nearai#3
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
Co-authored-by: Firat Sertgoz <firatsertgoz@Firats-Mac-mini.local>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…tion (nearai#238)

* feat: add extension registry with metadata catalog, CLI, and onboarding integration

Adds a central registry that catalogs all 14 available extensions (10 tools,
4 channels) with their capabilities, auth requirements, and artifact references.
The onboarding wizard now shows installable channels from the registry and
offers tool installation as a new Step 7.

- registry/ folder with per-extension JSON manifests and bundle definitions
- src/registry/ module: manifest structs, catalog loader, installer
- `ironclaw registry list|info|install|install-defaults` CLI commands
- Setup wizard enhanced: channels from registry, new extensions step (8 steps)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): resolve workspace errors for tool crates and channels-only onboarding

Tool crates in tools-src/ and channels-src/ failed `cargo metadata` during
onboard install because Cargo resolved them as part of the root workspace.
Add `[workspace]` table to each standalone crate and extend the root
`workspace.exclude` list so they build independently.

Channels-only mode (`onboard --channels-only`) failed with "Secrets not
configured" and "No database connection" because it skipped database and
security setup. Add `reconnect_existing_db()` to establish the DB connection
and load saved settings before running channel configuration.

Also improve the tunnel "already configured" display to show full provider
details (domain, mode, command) instead of just the provider name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(registry): address PR review feedback on installer and catalog

- Use manifest.name (not crate_name) for installed filenames so
  discovery, auth, and CLI commands all agree on the stem (nearai#1)
- Add AlreadyInstalled error variant instead of misleading
  ExtensionNotFound (nearai#2)
- Add DownloadFailed error variant with URL context instead of
  stuffing URLs into PathBuf (nearai#3)
- Validate HTTP status with error_for_status() before reading
  response bytes in artifact downloads (nearai#4)
- Switch build_wasm_component to tokio::process::Command with
  status() so build output streams to the terminal (nearai#6)
- Find WASM artifact by crate_name specifically instead of picking
  the first .wasm file in the release directory (nearai#7)
- Add is_file() guard in catalog loader to skip directories (nearai#8)
- Detect ambiguous bare-name lookups when both tools/<name> and
  channels/<name> exist, with get_strict() returning an error (nearai#9)
- Fix wizard step_extensions to check tool.name for installed
  detection, consistent with the new naming (nearai#11, nearai#12)
- Fix redundant closures and map_or clippy warnings in changed files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): restore DB connection fields after settings reload

reconnect_postgres() and reconnect_libsql() called Settings::from_db_map()
which overwrote database_url / libsql_path / libsql_url set from env vars.
Also use get_strict() in cmd_info to surface ambiguous bare-name errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix clippy collapsible_if and print_literal warnings

Collapse nested if-let chains and inline string literals in format
macros to satisfy CI clippy lint checks (deny warnings).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(registry): prefer artifacts for install-defaults and improve dir lookup

- InstallDefaults now defaults to downloading pre-built artifacts
  (matching `registry install` behavior), with --build flag for source builds.
- find_registry_dir() walks up 3 ancestor levels from the exe and adds
  a CARGO_MANIFEST_DIR fallback, matching load_registry_catalog() logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* refactor: extract shared assertion helpers to support/assertions.rs

Move 5 assertion helpers from e2e_spot_checks.rs to a shared module.
Add assert_all_tools_succeeded and assert_tool_succeeded for eliminating
false positives in E2E tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add tool output capture via tool_results() accessor

Extract (name, preview) from ToolResult status events in TestChannel
and TestRig, enabling content assertions on tool outputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct tool parameters in 3 broken trace fixtures

- tool_time.json: add missing "operation": "now" for time tool
- robust_correct_tool.json: same fix
- memory_full_cycle.json: change "path" to "target" for memory_write

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add tool success and output assertions to eliminate false positives

Every E2E test that exercises tools now calls assert_all_tools_succeeded.
Added tool output content assertions where tool results are predictable
(time year, read_file content, memory_read content).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: capture per-tool timing from ToolStarted/ToolCompleted events

Record Instant on ToolStarted and compute elapsed duration on
ToolCompleted, wiring real timing data into collect_metrics() instead
of hardcoded zeros.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: add RAII CleanupGuard for temp file/dir cleanup in tests

Replace manual cleanup_test_dir() calls and inline remove_file() with
Drop-based CleanupGuard that ensures cleanup even if a test panics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Drop impl and graceful shutdown for TestRig

Wrap agent_handle in Option so Drop can abort leaked tasks. Signal
the channel shutdown before aborting for future cooperative shutdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace agent startup sleep with oneshot ready signal

Use a oneshot channel fired in Channel::start() instead of a fixed
100ms sleep, eliminating the race condition on slow systems.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace fragile string-matching iteration limit with count-based detection

Use tool completion count vs max_tool_iterations instead of scanning
status messages for "iteration"/"limit" substrings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use assert_all_tools_succeeded for memory_full_cycle test

Remove incorrect comment about memory_tree failing with empty path
(it actually succeeds). Omit empty path from fixture and use the
standard assert_all_tools_succeeded instead of per-tool assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: promote benchmark metrics types to library code

Move TraceMetrics, ScenarioResult, RunResult, MetricDelta, and
compare_runs() from tests/support/metrics.rs to src/benchmark/metrics.rs.
Existing tests use re-export for backward compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add Scenario and Criterion types for agent benchmarking

Scenario defines a task with input, success criteria, and resource
limits. Criterion is an enum of programmatic checks (tool_used,
response_contains, etc.) evaluated without LLM judgment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add initial benchmark scenario suite (12 scenarios across 5 categories)

Scenarios cover tool_selection, tool_chaining, error_recovery,
efficiency, and memory_operations. All loaded from JSON with
deserialization validation test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add benchmark runner with BenchChannel and InstrumentedLlm

BenchChannel is a minimal Channel implementation for benchmarks.
InstrumentedLlm wraps any LlmProvider to capture per-call metrics.
Runner creates a fresh agent per scenario, evaluates success criteria,
and produces RunResult with timing, token, and cost metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add baseline management, reports, and benchmark entry point

- baseline.rs: load/save/promote benchmark results
- report.rs: format comparison reports with regression detection
- benchmark_runner.rs: integration test with real LLM (feature-gated)
- Add benchmark feature flag to Cargo.toml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply cargo fmt to benchmark module

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add multi-turn scenario types with setup, judge, ResponseNotContains

Add BenchScenario, Turn, TurnAssertions, JudgeConfig, ScenarioSetup,
WorkspaceSetup, SeedDocument types for multi-turn benchmark scenarios.
Add ResponseNotContains criterion variant. Add TurnAssertions::to_criteria()
converter for backward compat with existing evaluation engine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add JSON scenario loader with recursive discovery and tag filter

Add load_bench_scenarios() for the new BenchScenario format with recursive
directory traversal and tag-based filtering. Create 4 initial trajectory
scenarios across tool-selection, multi-turn, and efficiency categories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): multi-turn runner with workspace seeding and per-turn metrics

Add run_bench_scenario() that loops over BenchScenario turns, seeds workspace
documents, collects per-turn metrics (tokens, tool calls, wall time), and
evaluates per-turn assertions. Add TurnMetrics to metrics.rs and
clear_for_next_turn() to BenchChannel.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add LLM-as-judge scoring with prompt formatting and score parsing

Create judge.rs with format_judge_prompt, parse_judge_score, and judge_turn.
Wire into run_bench_scenario for turns with judge config -- scores below
min_score fail the turn.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add CLI subcommand (ironclaw benchmark)

Add BenchmarkCommand with --tags, --scenario, --no-judge, --timeout,
--update-baseline flags. Wire into Command enum and main.rs dispatch.
Feature-gated behind benchmark flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): per-scenario JSON output with full trajectory

Add save_scenario_results() that writes per-scenario JSON files alongside
the run summary. Each scenario gets its own file with turn_metrics trajectory.
Update CLI to use new output format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add ToolRegistry::retain_only and wire tool filtering in scenarios

Add a retain_only() method to ToolRegistry that filters tools down to a
given allowlist. Wire this into run_bench_scenario() so that when a
scenario specifies a tools list in its setup, only those tools are
available during the benchmark run. Includes two tests for the new
method: one verifying filtering works and one verifying empty input
is a no-op.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): wire identity overrides into workspace before agent start

Add seed_identity() helper that writes identity files (IDENTITY.md,
USER.md, etc.) into the workspace before the agent starts, so that
workspace.system_prompt() picks them up. Wire it into
run_bench_scenario() after workspace seeding. Include a test that
verifies identity files are written and readable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add --parallel and --max-cost CLI flags

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(benchmark): use feature-conditional snapshot names for CLI help tests

Prevents snapshot conflicts between default (no benchmark) and
all-features (with benchmark) builds by using separate snapshot names
per feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): parallel execution with JoinSet and budget cap enforcement

Replace sequential loop in run_all_bench() with parallel execution using
JoinSet + semaphore when config.parallel > 1. Add budget cap enforcement
that skips remaining scenarios when max_total_cost_usd is exceeded.
Track skipped count in RunResult.skipped_scenarios and display it in
format_report().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add tool restriction and identity override test scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: fix formatting for Phase 3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add SkillRegistry::retain_only and wire skill filtering in scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(benchmark): add --json flag for machine-readable output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add GitHub Actions benchmark workflow (manual trigger)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(benchmark): remove in-tree benchmark harness, keep retain_only utilities

Move benchmark-specific code out of ironclaw in preparation for the
nearai/benchmarks trajectory adapter. This removes:

- src/benchmark/ (runner, scenarios, metrics, judge, report, etc.)
- src/cli/benchmark.rs and the Benchmark CLI subcommand
- benchmarks/ data directory (scenarios + trajectories)
- .github/workflows/benchmark.yml
- The "benchmark" Cargo feature flag

What remains:
- ToolRegistry::retain_only() and SkillRegistry::retain_only()
- Test support types (TraceMetrics, InstrumentedLlm) inlined into
  tests/support/ instead of re-exporting from the deleted module

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add README for LLM trace fixture format

Documents the trajectory JSON format, response types, request hints,
directory structure, and how to write new traces.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(test): unify trace format around turns, add multi-turn support

Introduce TraceTurn type that groups user_input with LLM response steps,
making traces self-contained conversation trajectories. Add run_trace()
to TestRig for automatic multi-turn replay. Backward-compatible: flat
"steps" JSON is deserialized as a single turn transparently.

Includes all trace fixtures (spot, coverage, advanced), plan docs, and
new e2e tests for steering, error recovery, long chains, memory, and
prompt injection resilience.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): fix CI failures after merging main

- Fix tool_json fixture: use "data" parameter (not "input") to match
  JsonTool schema
- Fix status_events test: remove assertion for "time" tool that isn't
  in the fixture (only "echo" calls are used)
- Allow dead_code in test support metrics/instrumented_llm modules
  (utilities for future benchmark tests)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Working on recording traces and testing them

* feat(test): add declarative expects to trace fixtures, split infra tests

Add TraceExpects struct with 9 optional assertion fields (response_contains,
tools_used, all_tools_succeeded, etc.) that can be declared in fixture JSON
instead of hand-written Rust. Add verify_expects() and run_recorded_trace()
so recorded trace tests become one-liners.

Split trace infra tests (deserialization, backward compat) into
tests/trace_format.rs which doesn't require the libsql feature gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(test): add expects to all trace fixtures, simplify e2e tests

Add declarative expects blocks to all 19 trace fixture JSONs across
spot/, coverage/, advanced/, and root directories. Update all 8 e2e
test files to use verify_trace_expects() / run_and_verify_trace(),
replacing ~270 lines of hand-written assertions with fixture-driven
verification.

Tests that check things beyond expects (file content on disk, metrics,
event ordering) keep those extra assertions alongside the declarative
ones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): adapt tests to AppBuilder refactor, fix formatting

Update test files to work with refactored TestRigBuilder that uses
AppBuilder::build_all() (removing with_tools/with_workspace methods).
Update telegram_check fixture to use tool_list instead of echo.
Fix cargo fmt issues in src/llm/mod.rs and src/llm/recording.rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(test): deduplicate support unit tests into single binary

Support modules (assertions, cleanup, test_channel, test_rig, trace_llm)
had #[cfg(test)] mod tests blocks that were compiled and run 12 times —
once per e2e test binary that declares `mod support;`. Extracted all 29
support unit tests into a dedicated `tests/support_unit_tests.rs` so they
run exactly once.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix trailing newlines in support files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(test): unify trace types and fix recorded multi-turn replay

Import shared types (TraceStep, TraceResponse, TraceToolCall, RequestHint,
ExpectedToolResult, MemorySnapshotEntry, HttpExchange*) from
ironclaw::llm::recording instead of redefining them in trace_llm.rs.

Fix the flat-steps deserializer to split at UserInput boundaries into
multiple turns, instead of filtering them out and wrapping everything
into a single turn. This enables recorded multi-turn traces to be
replayed as proper multi-turn conversations via run_trace().

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): fix CI failures - unused imports and missing struct fields

- Add #[allow(unused_imports)] on pub use re-exports in trace_llm.rs
  (types are re-exported for downstream test files, not used locally)
- Add `..` to ToolCompleted pattern in test_channel.rs to match new
  `error` and `parameters` fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): fix CI failures after merging main

- Add missing `error` and `parameters` fields to ToolCompleted
  constructors in support_unit_tests.rs
- Add `..` to ToolCompleted pattern match in support_unit_tests.rs
- Add #[allow(dead_code)] to CleanupGuard, LlmTrace impl, and
  TraceLlm impl (only used behind #[cfg(feature = "libsql")])

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Adding coverage running script

* fix(test): address review feedback on E2E test infrastructure

- Increase wait_for_responses polling to exponential backoff (50ms-500ms)
  and raise default timeout from 15s to 30s to reduce CI flakiness (nearai#1)
- Strengthen prompt_injection_resilience test with positive safety layer
  assertion via has_safety_warnings(), enable injection_check (nearai#2)
- Add assert_tool_order() helper and tools_order field in TraceExpects
  for verifying tool execution ordering in multi-step traces (nearai#3)
- Document TraceLlm sequential-call assumption for concurrency (nearai#6)
- Clean up CleanupGuard with PathKind enum instead of shotgun
  remove_file + remove_dir_all on every path (nearai#8)
- Fix coverage.sh: default to --lib only, fix multi-filter syntax,
  add COV_ALL_TARGETS option
- Add coverage/ to .gitignore
- Remove planning docs from PR

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review - use HashSet in retain_only, improve skill test

- Use HashSet for O(N+M) lookup in SkillRegistry::retain_only and
  ToolRegistry::retain_only instead of linear scan
- Strengthen test_retain_only_empty_is_noop in SkillRegistry to
  pre-populate with a skill before asserting the no-op behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): revert incorrect safety layer assertion in injection test

The safety layer sanitizes tool output, not user input. The injection
test sends a malicious user message with no tools called, so the safety
layer never fires. Reverted to the original test which correctly
validates the LLM refuses via trace expects. Also fixed case-sensitive
request hint ("ignore" -> "Ignore") to suppress noisy warning.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: clean stale profdata before coverage run

Adds `cargo llvm-cov clean` before each run to prevent
"mismatched data" warnings from stale instrumentation profiles.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix formatting in retain_only test

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* test: add WIT compatibility tests for all WASM tools and channels

Adds CI and integration tests to catch WIT interface breakage across
all 14 WASM extensions (10 tools + 4 channels). Previously, changing
wit/tool.wit or wit/channel.wit could silently break guest-side tools
that weren't rebuilt until release time.

Three new pieces:

1. scripts/build-wasm-extensions.sh — builds all WASM extensions from
   source by reading registry manifests. Used by CI and locally.

2. tests/wit_compat.rs — integration tests that compile and instantiate
   each .wasm binary against the current wasmtime host linker with
   stubbed host functions. Catches added/removed/renamed WIT functions,
   signature mismatches, and missing exports. Skips gracefully when
   artifacts aren't built so `cargo test` still passes standalone.

3. .github/workflows/test.yml — new wasm-wit-compat CI job that builds
   all extensions then runs instantiation tests on every PR. Added to
   the branch protection roll-up.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix rustfmt formatting in wit_compat tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback on WIT compat tests

- Switch build script from python3 to jq for JSON parsing, consistent
  with release.yml and avoids python3 dependency (nearai#1, nearai#7)
- Use dirs::home_dir() instead of HOME env var for portability (nearai#2)
- Filter extensions by manifest "kind" field instead of path (nearai#3)
- Replace .flatten() with explicit error handling in dir iteration (nearai#4, nearai#5)
- Split stub_tool_host_functions into stub_shared_host_functions +
  tool-only tool-invoke stub, since tool-invoke is not in channel WIT (nearai#6)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
)

* feat: add inbound attachment support to WASM channel system

Add attachment record to WIT interface and implement inbound media
parsing across all four channel implementations (Telegram, Slack,
WhatsApp, Discord). Attachments flow from WASM channels through
EmittedMessage to IncomingMessage with validation (size limits,
MIME allowlist, count caps) at the host boundary.

- Add `attachment` record to `emitted-message` in wit/channel.wit
- Add `IncomingAttachment` struct to channel.rs and re-export
- Add host-side validation (20MB total, 10 max, MIME allowlist)
- Telegram: parse photo, document, audio, video, voice, sticker
- Slack: parse file attachments with url_private
- WhatsApp: parse image, audio, video, document with captions
- Discord: backward-compatible empty attachments
- Update FEATURE_PARITY.md section 7
- Add fixture-based tests per channel and host integration tests

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: integrate outbound attachment support and reconcile WIT types (nearai#409)

Reconcile PR nearai#409's outbound attachment work with our inbound attachment
support into a unified design:

WIT type split:
- `inbound-attachment` in channel-host: metadata-only (id, mime_type,
  filename, size_bytes, source_url, storage_key, extracted_text)
- `attachment` in channel: raw bytes (filename, mime_type, data) on
  agent-response for outbound sending

Outbound features (from PR nearai#409):
- `on-broadcast` WIT export for proactive messages without prior inbound
- Telegram: multipart sendPhoto/sendDocument with auto photo→document
  fallback for files >10MB
- wrapper.rs: `call_on_broadcast`, `read_attachments` from disk,
  attachment params threaded through `call_on_respond`
- HTTP tool: `save_to` param for binary downloads to /tmp/ (50MB limit,
  path traversal protection, SSRF-safe redirect following)
- Message tool: allow /tmp/ paths for attachments alongside base_dir
- Credential env var fallback in inject_channel_credentials

Channel updates:
- All 4 channels implement on_broadcast (Telegram full, others stub)
- Telegram: polling_enabled config, adjusted poll timeout
- Inbound attachment types renamed to InboundAttachment in all channels

Tests: 1965 passing (9 new), 0 clippy warnings

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add audio transcription pipeline and extensible WIT attachment design

Add host-side transcription middleware (OpenAI Whisper) that detects audio
attachments with inline data on incoming messages and transcribes them
automatically. Refactor WIT inbound-attachment to use extras-json and a
store-attachment-data host function instead of typed fields, so future
attachment properties (dimensions, codec, etc.) don't require WIT changes
that invalidate all channel plugins.

- Add src/transcription/ module: TranscriptionProvider trait,
  TranscriptionMiddleware, AudioFormat enum, OpenAI Whisper provider
- Add src/config/transcription.rs: TRANSCRIPTION_ENABLED/MODEL/BASE_URL
- Wire middleware into agent message loop via AgentDeps
- WIT: replace data + duration-secs with extras-json + store-attachment-data
- Host: parse extras-json for well-known keys, merge stored binary data
- Telegram: download voice files via store-attachment-data, add duration
  to extras-json, add /file/bot to HTTP allowlist, voice-only placeholder
- Add reqwest multipart feature for Whisper API uploads
- 5 regression tests for transcription middleware

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire attachment processing into LLM pipeline with multimodal image support

Attachments on incoming messages are now augmented into user text via XML tags
before entering the turn system, and images with data are passed as multimodal
content parts (base64 data URIs) to LLM providers. This enables audio transcripts,
document text, and image content to reach the LLM without changes to ChatMessage
serialization or provider interfaces.

- Add src/agent/attachments.rs with augment_with_attachments() and 9 unit tests
- Add ContentPart/ImageUrl types to llm::provider with OpenAI-compatible serde
- Carry image_content_parts transiently on Turn (skipped in serialization)
- Update nearai_chat and rig_adapter to serialize multimodal content
- Add 3 e2e tests verifying attachments flow through the full agent loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: CI failures — formatting, version bumps, and Telegram voice test

- Fix cargo fmt formatting in attachments.rs, nearai_chat.rs, rig_adapter.rs,
  e2e_attachments.rs
- Bump channel registry versions 0.1.0 → 0.2.0 (discord, slack, telegram,
  whatsapp) to satisfy version-bump CI check
- Fix Telegram test_extract_attachments_voice: add missing required `duration`
  field to voice fixture JSON

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: bump WIT channel version to 0.3.0, fix Telegram voice test, add pre-commit hook

- Bump wit/channel.wit package version 0.2.0 → 0.3.0 (interface changed with
  store-attachment-data)
- Update WIT_CHANNEL_VERSION constant and registry wit_version fields to match
- Fix Telegram test_extract_attachments_voice: gate voice download behind
  #[cfg(target_arch = "wasm32")] so host functions aren't called in native tests,
  update assertions for generated filename and extras_json duration
- Add @0.3.0 linker stubs in wit_compat.rs
- Add .githooks/pre-commit hook that runs scripts/check-version-bumps.sh when
  WIT or extension sources are staged
- Symlink commit-msg regression hook into .githooks/

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract voice download from extract_attachments into handle_message

Move download_voice_file + store_attachment_data calls out of
extract_attachments into a separate download_and_store_voice function
called from handle_message. This keeps extract_attachments as a pure
data-mapping function with no host calls, making it fully testable
in native unit tests without #[cfg(target_arch)] gates.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review comments — security, correctness, and code quality

Security fixes:
- Add path validation to read_attachments (restrict to /tmp/) preventing
  arbitrary file reads from compromised tools
- Escape XML special characters in attachment filenames, MIME types, and
  extracted text to prevent prompt injection via tag spoofing
- Percent-encode file_id in Telegram getFile URL to prevent query injection
- Clone SecretString directly instead of expose_secret().to_string()

Correctness fixes:
- Fix store_attachment_data overwrite accounting: subtract old entry size
  before adding new to prevent inflated totals and false rejections
- Use max(reported, stored_size) for attachment size accounting to prevent
  WASM channels from under-reporting size_bytes to bypass limits
- Add application/octet-stream to MIME allowlist (channels default unknown
  types to this)

Code quality:
- Extract send_response helper in Telegram, deduplicating on_respond and
  on_broadcast
- Rename misleading Discord test to test_parse_slash_command_interaction
- Fix .githooks/commit-msg to use relative symlink (portable across machines)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add tool_upgrade command + fix TOCTOU in save_to path validation

Add `tool_upgrade` — a new extension management tool that automatically
detects and reinstalls WASM extensions with outdated WIT versions.
Preserves authentication secrets during upgrade. Supports upgrading a
single extension by name or all installed WASM tools/channels at once.

Fix TOCTOU in `validate_save_to_path`: validate the path *before*
creating parent directories, so traversal paths like `/tmp/../../etc/`
cannot cause filesystem mutations outside /tmp before being rejected.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify WIT package version to 0.3.0 across tool.wit and all capabilities

tool.wit and channel.wit share the `near:agent` package namespace, so they
must declare the same version. Bumps tool.wit from 0.2.0 to 0.3.0 and
updates all capabilities files and registry entries to match.

Fixes `cargo component build` failure: "package identifier near:agent@0.2.0
does not match previous package name of near:agent@0.3.0"

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: move WIT file comments after package declaration

WIT treats `//` comments before `package` as doc comments. When both
tool.wit and channel.wit had header comments, the parser rejected them
as "doc comments on multiple 'package' items". Move comments after the
package declaration in both files.

Also bumps tool registry versions to 0.2.0 to match the WIT 0.3.0 bump.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: display extension versions in gateway Extensions tab

Add version field to InstalledExtension and RegistryEntry types, pipe
through the web API (ExtensionInfo, RegistryEntryInfo), and render as
a badge in the gateway UI for both installed and available extensions.

For installed WASM extensions, version is read from the capabilities
file with a fallback to the registry entry when the local file has no
version (old installations). Bump all extension Cargo.toml and registry
JSON versions from 0.1.0 to 0.2.0 to keep them in sync.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add document text extraction middleware for PDF, Office, and text files

Extract text from document attachments (PDF, DOCX, PPTX, XLSX, RTF, plain text,
code files) so the LLM can reason about uploaded documents. Uses pdf-extract for
PDFs, zip+XML parsing for Office XML formats, and UTF-8 decode for text files.
Wired into the agent loop after transcription middleware.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: download document files in Telegram channel for text extraction

The DocumentExtractionMiddleware needs file bytes in the attachment `data`
field, but only voice files were being downloaded. Document attachments
(PDFs, DOCX, etc.) had empty `data` and a source_url with a credential
placeholder that only works inside the WASM host's http_request.

Add `download_and_store_documents()` that downloads non-voice, non-image,
non-audio attachments via the existing two-step getFile→download flow and
stores bytes via `store_attachment_data` for host-side extraction.

Also rename `download_voice_file` → `download_telegram_file` since it's
generic for any file_id.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: allow Office MIME types and increase file download limit for Telegram

Two issues preventing document extraction from Telegram:

1. PPTX/DOCX/XLSX MIME types (application/vnd.*) were dropped by the
   WASM host attachment allowlist — add application/vnd., application/msword,
   and application/rtf prefixes.

2. Telegram file downloads over 10 MB failed with "Response body too large" —
   set max_response_bytes to 20 MB in Telegram capabilities.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: report document extraction errors back to user instead of silently skipping

- Bump max_response_bytes to 50 MB for Telegram file downloads
- When document extraction fails (too large, download error, parse error),
  set extracted_text to a user-friendly error message instead of leaving it
  None. This ensures the LLM tells the user what went wrong.
- On Telegram download failure, set extracted_text with the error so the
  user sees feedback even when the file never reaches the extraction middleware.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: store extracted document text in workspace memory for search/recall

After document extraction succeeds, write the extracted text to workspace
memory at `documents/{date}/{filename}`. This enables:
- Full-text and semantic search over past uploaded documents
- Cross-conversation recall ("what did that PDF say?")
- Automatic chunking and embedding via the workspace pipeline

Documents are stored with metadata header (uploader, channel, date, MIME type).
Error messages (extraction failures) are not stored — only successful extractions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: CI failures — formatting, unused assignment warning

- Run cargo fmt on document_extraction and agent_loop modules
- Suppress unused_assignments warning on trace_llm_ref (used only
  behind #[cfg(feature = "libsql")])

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review comments — security, correctness, and code quality

Security fixes:
- Remove SSRF-prone download() from DocumentExtractionMiddleware (nearai#13)
- Sanitize filenames in workspace path to prevent directory traversal (nearai#11)
- Pre-check file size before reading in WASM wrapper to prevent OOM (nearai#2)
- Percent-encode file_id in Telegram source URLs (nearai#7)

Correctness fixes:
- Clear image_content_parts on turn end to prevent memory leak (nearai#1)
- Find first *successful* transcription instead of first overall (nearai#3)
- Enforce data.len() size limit in document extraction (nearai#10)
- Use UTF-8 safe truncation with char_indices() (nearai#12)

Robustness & code quality:
- Add 120s timeout to OpenAI Whisper HTTP client (nearai#5)
- Trim trailing slash from Whisper base_url (nearai#6)
- Allow ~/.ironclaw/ paths in WASM wrapper (nearai#8)
- Return error from on_broadcast in Slack/Discord/WhatsApp (nearai#9)
- Fix doc comment in HTTP tool (nearai#4)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: formatting — cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address latest PR review — doc comments, error messages, version bumps

- Fix DocumentExtractionMiddleware doc comment (no longer downloads from source_url)
- Fix error message: "no inline data" instead of "no download URL"
- Log error + fallback instead of silent unwrap_or_default on Whisper HTTP client
- Bump all capabilities.json versions from 0.1.0 to 0.2.0 to match Cargo.toml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unsupported profile: minimal from CI workflows [skip-regression-check]

dtolnay/rust-toolchain@stable does not accept the 'profile' input
(it was a parameter for the deprecated actions-rs/toolchain action).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge with latest main — resolve compilation errors and PR review nits

- Add version: None to RegistryEntry/InstalledExtension test constructors
- Fix MessageContent type mismatches in nearai_chat tests (String → MessageContent::Text)
- Fix .contains() calls on MessageContent — use .as_text().unwrap()
- Remove redundant trace_llm_ref = None assignment in test_rig
- Check data size before clone in document extraction to avoid unnecessary allocation

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* feat: full image support across all channels

End-to-end image handling: upload, generation, analysis, editing, and
rendering across web gateway, HTTP webhook, WASM (Telegram/Slack), and
REPL channels. Builds on the attachment infrastructure from nearai#596 and
draws inspiration from PR nearai#641's image pipeline approach — credit to
that PR's author for the sentinel JSON pattern and base64-in-JSON
upload design.

Key changes:
- Image upload in web UI (file picker, paste, preview strip)
- Image generation tool (FLUX/DALL-E via /v1/images/generations)
- Image edit tool (multipart /v1/images/edits with fallback)
- Image analysis tool (vision model for workspace images)
- Model detection utilities (image_models.rs, vision_models.rs)
- Sentinel JSON detection in dispatcher for generated image rendering
- StatusUpdate::ImageGenerated → SSE/WS/REPL/WASM broadcast
- HTTP webhook attachment support (base64, 5MB/file, 10MB total)
- WASM channel image download (Telegram via file API, Slack via host HTTP)
- Tool registration wiring in app.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR nearai#725 review comments (16 issues)

- SecretString for API keys in all image tools (image_gen, image_edit, image_analyze)
- Binary image read via tokio::fs::read instead of DB-backed workspace.read()
- Replace Arc<Workspace> with Option<PathBuf> base_dir (workspace has no filesystem API)
- ApprovalRequirement::UnlessAutoApproved for cost-sensitive image tools
- Scope sentinel detection to image_generate/image_edit tool names only
- Skip ToolResult preview broadcast for image sentinels (avoids multi-MB base64 in SSE)
- Extract shared media_type_from_path() to builtin/mod.rs
- Rename fallback_chat_edit → fallback_generate with tracing::warn
- Increase gateway body limit from 1MB to 10MB for image uploads
- Increase webhook body limit to 15MB (base64 overhead)
- Log warning on invalid base64 in images_to_attachments
- Client-side image size limits (5MB/file, 5 images max) in app.js
- aria-label on attach button for accessibility
- Update body_too_large test for new 10MB limit

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Slack file size check before download (PR review item nearai#15)

Skip downloading files larger than 20 MB in the Slack WASM channel to
avoid excessive memory use and slow downloads in the WASM runtime.
Logs a warning when a file is skipped. Also bumps channel versions
for Slack and Telegram (prior branch changes).

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): add path validation and approval requirement to image tools

Add sandbox path validation via validate_path() to both ImageAnalyzeTool
and ImageEditTool to prevent path traversal attacks that could exfiltrate
arbitrary files through external vision/edit APIs. Also fix
ImageAnalyzeTool::requires_approval to return UnlessAutoApproved,
consistent with ImageEditTool and ImageGenerateTool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: post-download size guards and empty data_url sentinel check

- Slack: add post-download size check on actual bytes when metadata
  size_bytes is absent, preventing bypass of the 20MB limit
- Telegram: add 20MB download size limit (matching Slack) enforced
  in download_telegram_file() after receiving response bytes
- Dispatcher: skip broadcasting ImageGenerated SSE event when
  data_url is empty from unwrap_or_default(), log warning instead

Closes correctness issues nearai#3, nearai#4, nearai#5 from PR nearai#725 review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use mime_guess for media type detection, add alt attrs and media_type validation

- Replace hardcoded media type mapping with mime_guess crate (already in deps)
- Add alt attributes to img elements in web UI for accessibility
- Validate media_type starts with "image/" in images_to_attachments()
- Update bmp test assertion to match mime_guess behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zaki <zaki@iqlusion.io>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* fix: restore libSQL vector search with dynamic embedding dimensions (nearai#655)

The V9 migration dropped the libsql_vector_idx and changed
memory_chunks.embedding from F32_BLOB(1536) to BLOB, but the
documented brute-force cosine fallback was never implemented.
hybrid_search silently returned empty vector results — search was
FTS5-only on libSQL.

Add ensure_vector_index() which dynamically creates the vector index
with the correct F32_BLOB(N) dimension, inferred from EMBEDDING_DIMENSION
/ EMBEDDING_MODEL env vars during run_migrations(). Uses _migrations
version=0 as a metadata row to track the current dimension (no-op if
unchanged, rebuilds table on dimension change).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: move safety comments above multi-line assertions for rustfmt stability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove unnecessary safety comments from test code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review comments from PR nearai#1393 [skip-regression-check]

- Share model→dimension mapping via config::embeddings::default_dimension_for_model()
  instead of duplicating the match table (zmanian, Copilot)
- Add dimension bounds check (1..=65536) to prevent overflow (zmanian, Copilot)
- DROP stale memory_chunks_new before CREATE to handle crashed previous attempts
  (zmanian, Copilot)
- Use plain INSERT instead of INSERT OR IGNORE to surface constraint errors
  (Copilot)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing builder field to AgentDeps in telegram routing test [skip-regression-check]

The self-repair builder field was added to AgentDeps in nearai#712 but this
test was not updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address zmanian's second review on PR nearai#1393

- Add tracing::info when resolve_embedding_dimension returns None (nearai#2)
- Document connection scoping for transaction safety (nearai#1)
- Document _rowid preservation for FTS5 consistency (nearai#4)
- Document precondition that migrations must run first (nearai#5)
- Note F32_BLOB dimension enforcement in insert_chunk (nearai#3)
- Add unit tests for resolve_embedding_dimension (nearai#6)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* feat: port NPA psychographic profiling system into IronClaw

Port the complete psychographic profiling system from NPA into IronClaw,
including enriched profile schema, conversational onboarding, profile
evolution, and three-tier prompt augmentation.

Personal onboarding moved from wizard Step 9 to first assistant
interaction per maintainer feedback — the First Contact system prompt
block now instructs the LLM to conduct a natural onboarding conversation
that builds the psychographic profile via memory_write.

Changes:
- Enrich profile.rs with 5 new structs, 9-dimension analysis framework,
  custom deserializers for backward compatibility, and rendering methods
- Add conversational onboarding engine with one-step-removed questioning
  technique, personality framework, and confidence-scored profile generation
- Add profile evolution with confidence gating, analysis metadata tracking,
  and weekly update routine
- Replace thin interaction style injection with three-tier system gated on
  confidence > 0.6 and profile recency
- Replace wizard Step 9 with First Contact system prompt block that drives
  conversational onboarding during the user's first interaction
- Add autonomy progression to SOUL.md seed and personality framework to
  AGENTS.md seed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: replace chat-based onboarding with bootstrap greeting and workspace seeds

Remove the interactive onboarding_chat.rs engine in favor of a simpler
bootstrap flow: fresh workspaces get a proactive LLM greeting that
naturally profiles the user. Identity files are now seeded from
src/workspace/seeds/ instead of being hardcoded. Also removes the
identity-file write protection (seeds are now managed), adds routine
advisor integration, and includes an e2e trace for bootstrap greeting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(safety): sanitize identity file writes via Sanitizer to prevent prompt injection

Identity files (SOUL.md, AGENTS.md, USER.md, IDENTITY.md) are injected into
every system prompt. Rather than hard-blocking writes (which broke onboarding),
scan content through the existing Sanitizer and reject writes with High/Critical
severity injection patterns. Medium/Low warnings are logged but allowed.

Also clarifies AGENTS.md identity file roles (USER.md = user info, IDENTITY.md =
agent identity) and adds IDENTITY.md setup as an explicit bootstrap step.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update profile_onboarding_completed comment to reflect current wiring

The field is now actively used by the agent loop to suppress BOOTSTRAP.md
injection — remove the stale "not yet wired" TODO.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): use env_or_override for NEARAI_API_KEY in model fetch config

When the user authenticates via NEAR AI Cloud API key (option 4),
api_key_login() stores the key via set_runtime_env(). But
build_nearai_model_fetch_config() was using std::env::var() which
doesn't check the runtime overlay — so model listing fell back to
session-token auth and re-triggered the interactive NEAR AI
authentication menu.

Switch to env_or_override() which checks both real env vars and the
runtime overlay.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(agent): correct channel/user_id in bootstrap greeting persist call

persist_assistant_response was called with channel="default",
user_id="system" but the assistant thread was created via
get_or_create_assistant_conversation("default", "gateway") which owns
the conversation as user_id="default", channel="gateway". The mismatch
caused ensure_writable_conversation to reject the write with:

  WARN Rejected write for unavailable thread id user=system channel=default

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(web): remove all inline event handlers for CSP compliance

The Content-Security-Policy header (added in f48fe95) blocks inline JS
via script-src 'self'. All onclick/onchange attributes in index.html
are replaced with getElementById().addEventListener() calls. Dynamic
inline handlers in app.js (jobs, routines, memory breadcrumb, code
blocks, TEE report) are replaced with data-action attributes and a
single delegated click handler on document.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(agent): align bootstrap message user/channel and update fixture schema field

- Bootstrap IncomingMessage now uses ("default", "gateway") consistently
  with persist and session registration calls
- Update bootstrap_greeting.json fixture: schema_version → version to
  match current PROFILE_JSON_SCHEMA

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(safety): address PR review — expand injection scanning and harden profile sync

- BOOTSTRAP.md: fix target "profile" → "context/profile.json" so the
  write hits the correct path and triggers profile sync
- IDENTITY_FILES: add context/assistant-directives.md to the scanned
  set since it is also injected into the system prompt
- sync_profile_documents(): scan derived USER.md and assistant-directives
  content through Sanitizer before writing, rejecting High/Critical
  injection patterns
- profile_evolution_prompt(): wrap recent_messages_summary in <user_data>
  delimiters with untrusted-data instruction to mitigate indirect
  prompt injection
- routine-advisor skill: update cron examples from 6-field to standard
  5-field format for consistency with routine_create tool docs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): detect env-provided LLM keys during quick-mode onboarding

Quick-mode wizard now checks LLM_BACKEND, NEARAI_API_KEY,
ANTHROPIC_API_KEY, and OPENAI_API_KEY env vars to pre-populate
the provider setting, so users aren't re-prompted for credentials
they already supplied. Also teaches setup_nearai() to recognize
NEARAI_API_KEY from env (previously only checked session tokens).

Includes web UI cleanup (remove duplicate event listeners) and
e2e test response count adjustment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): update routine_create_list to expect 7-field normalized cron

The cron normalizer now always expands to 7-field format, so the
stored schedule is "0 0 9 * * * *" not "0 0 9 * * *".

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(setup): skip LLM provider prompts when NEARAI_API_KEY is present

In quick mode, if NEARAI_API_KEY is set in the environment and the
backend was auto-detected as nearai, skip the interactive inference
provider and model selection steps. The API key is persisted to the
secrets store and a default model is set automatically.

Also simplify the static fallback model list for nearai to a single
default entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: unify default model, static bootstrap greeting, and web UI cleanup

- Add DEFAULT_MODEL const and default_models() fallback list in
  llm/nearai_chat.rs; use from config, wizard, and .env.example so the
  default model is defined in one place
- Restore multi-model fallback list in setup wizard (was reduced to 1)
- Move BOOTSTRAP_GREETING to module-level const (out of run() body)
- Replace LLM-based bootstrap with static greeting (persist to DB before
  channels start, then broadcast — eliminates startup LLM call and race)
- Fix double env::var read for NEARAI_API_KEY in quick setup path
- Move thread sidebar buttons into threads-section-header (web UI)
- Remove orphaned .thread-sidebar-header CSS and fix double blank line
- Update bootstrap e2e test for static greeting (no LLM trace needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(safety): move prompt injection scanning into Workspace write/append

Addresses PR nearai#927 review comments (nearai#1, nearai#3) — identity file write
protection and unsanitized profile fields in system prompt.

Instead of scanning at the tool layer (memory.rs) or the sync layer
(sync_profile_documents), injection scanning now lives in
Workspace::write() and Workspace::append() for all files that are
injected into the system prompt. This ensures every code path that
writes to these files is protected, including future ones.

- Add SYSTEM_PROMPT_FILES const and reject_if_injected() in workspace
- Add WorkspaceError::InjectionRejected variant
- Add map_write_err() in memory.rs to convert InjectionRejected to
  ToolError::NotAuthorized
- Remove redundant IDENTITY_FILES/Sanitizer from memory.rs
- Remove redundant sanitizer calls from sync_profile_documents()
- Move sanitization tests to workspace::tests
- Existing integration test (test_memory_write_rejects_injection)
  continues to pass through the new path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Copilot review — merge marker order, orphan thread, stale fixture

- merge_profile_section: search for END marker after BEGIN position to
  avoid matching a stray END earlier in the file
- Bootstrap phase 2: use get_or_create_session + Thread::with_id instead
  of resolve_thread(None) to avoid creating an orphan thread
- setup_nearai: use env_or_override for NEARAI_API_KEY consistency with
  runtime overlay
- Delete orphaned bootstrap_greeting.json fixture (no test references it)
- Add test_merge_end_marker_must_follow_begin regression test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: fmt agent_loop.rs (CI stable rustfmt)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: lazy-init sanitizer, check profile non-empty before skipping bootstrap

Address Copilot review:
- Use LazyLock<Sanitizer> to avoid rebuilding Aho-Corasick + regexes
  on every workspace write
- has_profile check now requires non-empty content, not just file
  existence, to prevent empty profile.json from suppressing onboarding
- Add seed_tests integration tests (libsql-backed) verifying:
  - Empty profile.json does not suppress BOOTSTRAP.md seeding
  - Non-empty profile.json correctly suppresses bootstrap for upgrades

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: duplicate language handler, empty LLM_BACKEND, test_rig style

Address Copilot review on PR nearai#927:
- Remove duplicate language-option click listeners (delegated
  data-action handler already covers them)
- Guard LLM_BACKEND env prefill against empty string to prevent
  suppressing API-key-based auto-detection
- Use destructured local `keep_bootstrap` instead of `self.keep_bootstrap`
  in test_rig for consistency after destructure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update stale BOOTSTRAP.md write-protection comment [skip-regression-check]

BOOTSTRAP.md is now in SYSTEM_PROMPT_FILES and gets injection scanning
on write. The old comment incorrectly stated it was not write-protected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace debug_assert panics with graceful error returns [skip-regression-check]

debug_assert! in execute_tool_with_safety and JobContext::transition_to
panicked in test builds before the graceful error path could run.
Existing tests (test_cancel_job_completed, test_execute_empty_tool_name_returns_not_found)
already cover these paths — they were the ones failing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Copilot review — schema label, env var check, path normalization, profile validation

1. Label ANALYSIS_FRAMEWORK and PROFILE_JSON_SCHEMA sections separately
   in bootstrap prompt so the LLM knows which blob is the target structure.

2. Wizard quick-mode backend auto-detection now rejects empty env vars
   (std::env::var().is_ok_and(|v| !v.is_empty())) to avoid selecting the
   wrong backend when e.g. NEARAI_API_KEY="" is set.

3. Normalize the target path before comparing with paths::PROFILE in
   memory_write so non-canonical variants like "context//profile.json"
   still trigger profile sync.

4. seed_if_empty now requires valid JSON parse of context/profile.json
   before treating it as a populated profile. Corrupted content no longer
   permanently suppresses bootstrap seeding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

* fix: address Copilot review — append scan, profile validation, env_or_override

1. Workspace::append() now scans the combined content (existing + new)
   for prompt injection, not just the appended chunk. Prevents split-
   injection evasion across multiple appends.

2. seed_if_empty() now deserializes into PsychographicProfile instead of
   serde_json::Value for profile validation. Stray/legacy JSON that
   doesn't match the expected schema no longer suppresses bootstrap.

3. Wizard quick-mode backend auto-detection now uses env_or_override()
   to honor runtime overlays and injected secrets. LLM_BACKEND value
   is trimmed before storage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add bootstrap_onboarding_clears_bootstrap E2E trace test

Exercises the full onboarding flow end-to-end:
1. Bootstrap greeting fires automatically on fresh workspace
2. User converses for 3 turns (name, tools, work style)
3. Agent writes psychographic profile to context/profile.json
4. Profile sync generates USER.md and assistant-directives.md
5. Agent writes IDENTITY.md (chosen persona)
6. Agent clears BOOTSTRAP.md via memory_write(target: "bootstrap")

Verifies:
- BOOTSTRAP.md is non-empty before onboarding, empty after
- bootstrap_completed flag is set
- Profile contains expected user data (name, profession, interests)
- USER.md contains profile-derived content (name, tone, profession)
- Assistant-directives.md references user and communication style
- IDENTITY.md contains agent's chosen persona name
- All memory_write calls succeed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Copilot review — slash collapse, env_or_override, cron trim [skip-regression-check]

1. memory.rs path normalization now uses the same char-by-char loop as
   Workspace::normalize_path() to fully collapse consecutive slashes
   (e.g. "context///profile.json" → "context/profile.json").

2. Quick-mode NEARAI_API_KEY check (line 239) now uses env_or_override()
   consistently with the backend auto-detection block above it.

3. normalize_cron_expression() trims input before field counting so the
   passthrough branch (7+ fields) also strips whitespace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jay Zalowitz <jayzalowitz@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* feat(agent): queue and merge messages during active turns

Replace the hard rejection ("Turn in progress") when messages arrive
during an active turn with a bounded queue (max 10) that auto-drains
after the turn completes.

Queued messages are merged with newlines into a single turn so the LLM
receives full context from rapid consecutive inputs instead of producing
fragmented responses from partial context.

Key changes:
- Thread.pending_messages (VecDeque) with queue_message/drain_pending_messages
- Drain loop in agent_loop.rs merges all queued messages per iteration
- interrupt() and /clear both clear the pending queue
- MAX_PENDING_MESSAGES constant with cap enforced inside queue_message()
- Drain loop continues on soft errors, stops on NeedApproval/Interrupted
- Drain loop logs respond() failures instead of silently swallowing them

Fixes nearai#259 — debounces rapid inbound messages during processing
Fixes nearai#826 — drain loop is bounded by MAX_PENDING_MESSAGES cap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review — drain loop busy-loop guard and stale state re-check

- Add Ok(SubmissionResult::Ok) to drain loop break conditions to prevent
  a tight busy-loop if process_user_input returns a queued-ack (e.g. from
  a corrupted/hydrated session stuck in Processing state)
- Re-check thread.state under the mutable lock in the Processing arm to
  guard against the turn completing between the snapshot read and the
  queue operation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clear attachments on drain-loop queued message processing

Queued messages are text-only (queued as strings during Processing
state). The drain loop was reusing the original IncomingMessage
reference which carried the first message's attachments, causing
augment_with_attachments to incorrectly re-apply them to unrelated
queued text. Clone the message with cleared attachments for drain-loop
turns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review round 2 — stale state fallthrough and thread-not-found guard

- Processing arm: when re-checked state is no longer Processing, fall
  through to normal processing instead of dropping user input
- Processing arm: return error when thread not found instead of false
  "queued" ack
- Document intermediate drain-loop responses as best-effort for one-shot
  channels (HttpChannel)
- Add regression tests for both edge cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review feedback for message queue drain loop

[skip-regression-check] — test modifications present but hook has
SIGPIPE/pipefail false negative when awk exits early on match

- Replace wildcard match in drain loop with explicit `while let
  Ok(Response)` guard — stops on Error variant too, preventing
  confusing interleaved output after soft errors (review issue nearai#1)
- Reject queueing messages with attachments during Processing state
  instead of silently dropping them (review issue nearai#2)
- Document response routing limitation: all drain-loop responses
  route via original message identity (review issue nearai#3)
- Document why SubmissionResult::Ok is correct for queued ack and
  how it interacts with drain loop break condition (review issue nearai#4)
- Rewrite two dead regression tests to assert actual behavior:
  thread-gone returns error, state-changed does not queue (review nearai#5)
- Document MAX_PENDING_MESSAGES=10 as acceptable for personal
  assistant use case (review issue nearai#6)
- Fix misleading one-shot channel comment — HttpChannel consumes
  sender on first call, subsequent calls are dropped (review issue nearai#8)
- Simplify drain loop intermediate response since while-let guard
  guarantees Response variant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing extension_manager field in webhook EngineContext

The fire_webhook method's EngineContext initializer was missing the
extension_manager field added in staging, causing CI compilation failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gate TestRig::session_manager() behind libsql feature flag

The field is #[cfg(feature = "libsql")] so the accessor must match.
All callers are already inside #[cfg(feature = "libsql")] blocks.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: re-queue drained messages on drain loop failure

If process_user_input fails after drain_pending_messages() removed
all queued content, that user input was permanently lost. Now the
merged content is re-queued at the front of pending_messages on any
non-Response result so it will be processed on the next successful
turn.

Adds Thread::requeue_drained() helper and unit test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unreachable!() from drain loop, add lock-drop comments

- Extract content binding in `while let` pattern instead of using a
  separate match with unreachable!() — satisfies the no-panic-in-
  production convention (zmanian review item nearai#1)
- Add comment clarifying session lock is dropped at Processing arm
  boundary before fall-through (zmanian review item nearai#5)
- Document bounded cap overshoot on requeue_drained (review item nearai#2)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): validate queued messages and touch updated_at on queue ops

- Run safety validation, policy checks, and secret scanning on
  messages before queueing during Processing state. Previously,
  content with leaked secrets could be stored in pending_messages
  and serialized without hitting the inbound scanner.
- Touch updated_at in queue_message(), drain_pending_messages(),
  and requeue_drained() so thread timestamps reflect queue activity.

[skip-regression-check] — safety validation requires full Agent;
updated_at is a data-level fix on existing tested methods

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
serrrfirat pushed a commit that referenced this pull request Mar 29, 2026
GATEWAY_USER_TOKENS never went to production — replaced entirely by
DB-backed user management via /api/admin/users and /api/tokens.

Removed:
- UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing
- user_tokens field from GatewayConfig
- GatewayChannel::new_multi_auth() constructor
- Env-var user migration block in main.rs (~90 lines)
- multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime
  via db.has_any_users() in app.rs)

Review fixes (zmanian):
- User ID generation: UUID instead of display-name derivation (#1)
- Invitation accept moved to public router (no auth needed) (#3)
- libSQL get_invitation_by_hash aligned with postgres: filters
  status='pending' AND expires_at > now (#4)
- UUID parse: returns DatabaseError::Serialization instead of
  unwrap_or_default (#7)
- PostgreSQL SELECT * replaced with explicit column lists (#8)
- Sort order aligned (both backends use DESC) (#6)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DougAnderson444 pushed a commit to DougAnderson444/ironclaw that referenced this pull request Mar 29, 2026
…i-tenant isolation (nearai#1626)

* feat: complete multi-tenant isolation — per-user budgets, model selection, heartbeat cycling

Finishes the remaining isolation work from phases 2–4 of nearai#59:

Phase 2 (DB scoping): Fix /status and /list commands to use _for_user
DB variants instead of global queries that leaked cross-user job data.

Phase 3 (Runtime isolation): Per-user workspace in routine engine's
spawn_fire so lightweight routines run in the correct user context.
Per-user daily cost tracking in CostGuard with configurable budget via
MAX_COST_PER_USER_PER_DAY_CENTS. Multi-user heartbeat that cycles
through all users with routines, auto-detected from GATEWAY_USER_TOKENS.

Phase 4 (Provider/tools): Per-user model selection via preferred_model
setting — looked up from SettingsStore on first iteration, threaded
through ReasoningContext.model_override to CompletionRequest. Works
with providers that support per-request model overrides (NearAI).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use selected_model setting key to match /model command persistence

The dispatcher was reading "preferred_model" but the /model command
(merged from staging) persists to "selected_model". Since set_setting
is already per-user scoped, using the same key makes /model work as
the per-user model override in multi-tenant mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: heartbeat hygiene, /model multi-tenant guard, RigAdapter model override

Three follow-up fixes for multi-tenant isolation:

1. Multi-user heartbeat now runs memory hygiene per user before each
   heartbeat check, matching single-user heartbeat behavior.

2. /model command in multi-tenant mode only persists to per-user
   settings (selected_model) without calling set_model() on the shared
   LlmProvider. The per-request model_override in the dispatcher reads
   from the same setting. Added multi_tenant flag to AgentConfig
   (auto-detected from GATEWAY_USER_TOKENS).

3. RigAdapter now supports per-request model overrides by injecting the
   model name into rig-core's additional_params. OpenAI/Anthropic/Ollama
   API servers use last-key-wins for duplicate JSON keys, so the override
   takes effect via serde's flatten serialization order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review — cost model attribution, heartbeat concurrency, pruning

Fixes from review comments on nearai#1614:

- Cost tracking now uses the override model name (not active_model_name)
  when a per-user model override is active, for accurate attribution.
- Multi-user heartbeat runs per-user checks concurrently via JoinSet
  instead of sequentially, preventing one slow user from blocking others.
- Per-user failure counts tracked independently; users exceeding
  max_failures are skipped (matching single-user semantics).
- per_user_daily_cost HashMap pruned on day rollover to prevent
  unbounded growth in long-lived deployments.
- Doc comment fixed: says "routines" not "active routines".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: /status ownership, model persistence scoping, heartbeat robustness

Addresses second round of PR review on nearai#1614:

- /status <job_id> DB path now validates job.user_id == requesting user
  before returning data (was missing ownership check, security fix).

- persist_selected_model takes user_id param instead of owner_id, and
  skips .env/TOML writes in multi-tenant mode (these are shared global
  files). handle_system_command now receives user_id from caller.

- JoinSet collection handles Err(JoinError) explicitly instead of
  silently dropping panicked tasks.

- Notification forwarder extracts owner_id from response metadata in
  multi-tenant mode for per-user routing instead of broadcasting to
  the agent owner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: cost pricing, fire_manual workspace, heartbeat concurrency cap

Round 3 review fixes:

- Cost tracking passes None for cost_per_token when model override is
  active, letting CostGuard look up pricing by model name instead of
  using the default provider's rates (serrrfirat).

- fire_manual() now uses per-user workspace, matching spawn_fire()
  pattern (serrrfirat).

- Removed MULTI_TENANT env var — multi-tenant mode is auto-detected
  solely from GATEWAY_USER_TOKENS presence (serrrfirat + Copilot).

- Multi-user heartbeat capped at 8 concurrent tasks to avoid flooding
  the LLM provider (serrrfirat + Copilot).

- Fixed inject_model_override doc comment accuracy (Copilot).

- Added comment explaining multi-tenant notification routing priority
  (Copilot).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: user-scoped webhook endpoint for multi-tenant isolation

Adds POST /api/webhooks/u/{user_id}/{path} — a user-scoped webhook
endpoint that filters the routine lookup by user_id, preventing
cross-user webhook triggering when paths collide.

The existing /api/webhooks/{path} endpoint remains unchanged for
backward compatibility in single-user deployments.

Changes:
- get_webhook_routine_by_path gains user_id: Option<&str> param
- Both postgres and libsql implementations add AND user_id = ? filter
  when user_id is provided
- New webhook_trigger_user_scoped_handler extracts (user_id, path)
  from URL and passes to shared fire_webhook_inner logic
- Route registered on public router (webhooks are called by external
  services that can't send bearer tokens)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(db): add UserStore trait with users, api_tokens, invitations tables

Foundation for DB-backed user management (nearai#1605):

- UserRecord, ApiTokenRecord, InvitationRecord types in db/mod.rs
- UserStore sub-trait (17 methods) added to Database supertrait
- PostgreSQL migration V14__users.sql (users, api_tokens, invitations)
- libSQL schema + incremental migration V14
- Full implementations for both PgBackend (via Store delegation) and
  LibSqlBackend (direct SQL in libsql/users.rs)
- authenticate_token JOINs api_tokens+users with active/non-revoked
  checks; has_any_users for bootstrap detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(web): DB-backed auth, user/token/invitation API handlers

Adds the web gateway layer for DB-backed user management (nearai#1605):

Auth refactor:
- CombinedAuthState wraps env-var tokens (MultiAuthState) + optional
  DbAuthenticator for DB-backed token lookup with LRU cache (60s TTL,
  1024 max entries)
- auth_middleware tries env-var tokens first, then DB fallback
- From<MultiAuthState> impl for backward compatibility
- main.rs wires with_db_auth when database is available

API handlers (12 new endpoints):
- /api/admin/users — CRUD: create, list, detail, update, suspend, activate
- /api/tokens — create (returns plaintext once), list, revoke
- /api/invitations — create, list, accept (creates user + first token)

Token creation: 32 random bytes → hex plaintext, SHA-256 hash stored.
Invitation accept: validates hash + pending + not expired, creates
user record and first API token atomically.

All test files updated for CombinedAuthState type change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: startup env-var user migration + UserStore integration tests

Completes the DB-backed user management feature (nearai#1605):

- Startup migration: when GATEWAY_USER_TOKENS is set and the users
  table is empty, inserts env-var users + hashed tokens into DB.
  Logs deprecation notice when DB already has users.
- hash_token made pub for reuse in migration code.
- 10 integration tests for UserStore (libsql file-backed):
  - has_any_users bootstrap detection
  - create/get/get_by_email/list/update user lifecycle
  - token create → authenticate → revoke → reject cycle
  - suspended user tokens rejected
  - wrong-user token revoke returns false
  - invitation create → accept → user created
  - record_login and record_token_usage timestamps
- libSQL migration: removed FK constraints from V14 (incompatible
  with execute_batch inside transactions). Tables in both base SCHEMA
  and incremental migration for fresh and existing databases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove GATEWAY_USER_TOKENS, fix review feedback

GATEWAY_USER_TOKENS never went to production — replaced entirely by
DB-backed user management via /api/admin/users and /api/tokens.

Removed:
- UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing
- user_tokens field from GatewayConfig
- GatewayChannel::new_multi_auth() constructor
- Env-var user migration block in main.rs (~90 lines)
- multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime
  via db.has_any_users() in app.rs)

Review fixes (zmanian):
- User ID generation: UUID instead of display-name derivation (nearai#1)
- Invitation accept moved to public router (no auth needed) (nearai#3)
- libSQL get_invitation_by_hash aligned with postgres: filters
  status='pending' AND expires_at > now (nearai#4)
- UUID parse: returns DatabaseError::Serialization instead of
  unwrap_or_default (nearai#7)
- PostgreSQL SELECT * replaced with explicit column lists (nearai#8)
- Sort order aligned (both backends use DESC) (nearai#6)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add role-based access control (admin/member)

Adds a `role` field (admin|member) to user management:

Schema:
- `role TEXT NOT NULL DEFAULT 'member'` added to users table in both
  PostgreSQL V14 migration and libSQL schema/incremental migration
- UserRecord gains `role: String` field
- UserIdentity gains `role: String` field, populated from DB in
  DbAuthenticator and defaulting to "admin" for single-user mode

Access control:
- AdminUser extractor: returns 403 Forbidden if role != "admin"
- /api/admin/users/* handlers: require AdminUser (create, list,
  detail, update, suspend, activate)
- POST /api/invitations: requires AdminUser (only admins can invite)
- User creation accepts optional "role" param (defaults to "member")
- Invitation acceptance creates users with "member" role

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(web): add Users admin tab to web UI

Adds a Users tab to the web gateway UI for managing users, tokens,
and roles without needing direct API calls.

Features:
- User list table with ID, name, email, role, status, created date
- Create user form with display name, email, role selector
- Suspend/activate actions per user
- Create API token for any user (shows plaintext once with copy button)
- Role badges (admin highlighted, member muted)
- Non-admin users see "Admin access required" message
- Keyboard shortcut: Cmd/Ctrl+5 switches to Users tab

CSS:
- Reuses routines-table styles for the user list
- Badge, token-display, btn-small, btn-danger, btn-primary components

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move Users to Settings subtab, bootstrap admin user on first run

- Moved Users from top-level tab to Settings sidebar subtab (under
  Skills, before Theme toggle)
- On first startup with empty users table, automatically creates an
  admin user from GATEWAY_USER_ID config with a corresponding API
  token from GATEWAY_AUTH_TOKEN. This ensures the owner appears in
  the Users panel immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: user creation shows token, + Token works, no password save popup

Three UI/UX fixes:

1. Create user now generates an initial API token and shows it in a
   copy-able banner instead of triggering the browser's password save
   dialog. Uses autocomplete="off" and type="text" for email field.

2. "+ Token" button works: exposed createTokenForUser/suspendUser/
   activateUser on window for inline onclick handlers in dynamically
   generated table rows. Token creation uses showTokenBanner helper.

3. Admin token creation: POST /api/tokens now accepts optional
   "user_id" field when the requesting user is admin, allowing
   token creation for other users from the Users panel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use event delegation for user action buttons (CSP compliance)

Inline onclick handlers are blocked by the Content-Security-Policy
(script-src 'self' without 'unsafe-inline'). Switched to data-action
attributes with a delegated click listener on the users table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add i18n for Users subtab, show login link on user creation

- Added 'settings.users' i18n key for English and Chinese
- Token banner now shows a full login link (domain/?token=xxx)
  with a Copy Link button, plus the raw token below
- Login link works automatically via existing ?token= auto-auth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: token hash mismatch — hash hex string, not raw bytes

Critical auth bug: token creation hashed the raw 32 bytes
(hasher.update(token_bytes)) but authentication hashed the hex-encoded
string (hash_token(candidate) where candidate is the hex string the
user sends). This meant newly created tokens could never authenticate.

Fixed all 4 token creation sites (users, tokens, invitations create,
invitations accept) to use hash_token(&plaintext_token) which hashes
the hex string consistently with the auth lookup path.

Removed now-unused sha2::Digest imports from handlers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove invitation system

The invitation flow is redundant — admin create user already generates
a token and shows a login link. Invitations add complexity without
value until email integration exists.

Removed:
- InvitationRecord struct and 4 UserStore trait methods
- invitations table from V14 migration (postgres + both libsql schemas)
- PostgreSQL Store methods (create/get/accept/list invitations)
- libSQL UserStore invitation methods + row_to_invitation helper
- invitations.rs handler file (212 lines)
- /api/invitations routes (create, list, accept)
- test_invitation_lifecycle test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: user deletion, self-service profile, per-user job limits, usage API

Four multi-tenancy improvements:

1. User deletion cascade (DELETE /api/admin/users/{id}):
   Deletes user and all data across 11 user-scoped tables (settings,
   secrets, routines, memory, jobs, conversations, etc.). Admin only.

2. Self-service profile (GET/PATCH /api/profile):
   Users can read and update their own display_name and metadata
   without admin privileges.

3. Per-user job concurrency (MAX_JOBS_PER_USER env var):
   Scheduler checks active_jobs_for(user_id) before dispatch.
   Prevents one user from exhausting all job slots.

4. Usage reporting (GET /api/admin/usage?user_id=X&period=day|week|month):
   Aggregates LLM costs from llm_calls via agent_jobs.user_id.
   Returns per-user, per-model breakdown of calls, tokens, and cost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add TenantCtx for compile-time tenant isolation

Implements zmanian's architectural proposal from nearai#1614 review:
two-tier scoped database access (TenantScope/AdminScope) so handler
code cannot accidentally bypass tenant scoping.

TenantScope (default): wraps user_id + Arc<dyn Database>, auto-binds
user_id on every operation. ID-based lookups return None for cross-
tenant resources. No escape hatch — forgetting to scope is a compile
error.

AdminScope (explicit opt-in): cross-tenant access for system-level
components (heartbeat, routine engine, self-repair, scheduler, worker).

TenantCtx bundles TenantScope + workspace + cost guard + per-user
rate limiting. Constructed once per request in handle_message, threaded
through all command handlers and ChatDelegate.

Key changes:
- New src/tenant.rs (~920 lines): TenantScope, AdminScope, TenantCtx,
  TenantRateState, TenantRateRegistry
- All command handlers: user_id: &str → ctx: &TenantCtx
- ChatDelegate: cost check/record/settings via self.tenant
- System components: store field changed to AdminScope
- Config: TENANT_MAX_LLM_CONCURRENT, TENANT_MAX_JOBS_CONCURRENT env vars
- Fixes bug: /status <job_id> cross-tenant leak (now auto-filtered)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR nearai#1626 review feedback — bounded LRU cache, admin auth, FK cleanup

- Replace HashMap with lru::LruCache in DbAuthenticator so the token
  cache is hard-bounded at 1024 entries (evicts LRU, not just expired)
- Gate admin user endpoints (list/detail/update/suspend/activate) with
  AdminUser extractor so members get 403 instead of full access
- Add api_tokens to libSQL delete_user cleanup list to prevent orphaned
  tokens (libSQL has no FK cascade)
- Add regression tests for all three fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update CA certificates in runtime Docker image

Ensures the root certificate bundle is current so TLS handshakes
to services like Supabase succeed on Railway.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CI failures — formatting, no-panics check

- Run cargo fmt on test code
- Replace .expect() with const NonZeroUsize in DbAuthenticator
- Add // safety: comments for test-only code in multi_tenant.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: switch PostgreSQL TLS from rustls to native-tls

rustls with rustls-native-certs fails TLS handshake on Railway's
slim container (empty or stale root cert store). native-tls delegates
to OpenSSL on Linux which handles system certs more reliably.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Adding user management api

* feat: admin secrets provisioning API + API documentation

- Add PUT/GET/DELETE /api/admin/users/{id}/secrets/{name} endpoints for
  application backends to provision per-user secrets (AES-256-GCM encrypted)
- Add secrets_store field to GatewayState with builder wiring
- Create docs/USER_MANAGEMENT_API.md with full API spec covering users,
  secrets, tokens, profile, and usage endpoints
- Update web gateway CLAUDE.md route table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add CatchPanicLayer to capture handler panics

Without this, panics in async handlers silently drop the connection
and the edge proxy returns a generic 503. Now panics are caught,
logged, and returned as 500 with the panic message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address second-round review — transactional delete, overflow, error logging

- C1: Wrap PostgreSQL delete_user() in a transaction so partial cleanup
  can't leave users in a half-deleted state
- M2: Add job_events to delete cleanup (both backends) — FK to
  agent_jobs without CASCADE would cause FK violation
- H1/M4: Cap expires_in_days to 36500 before i64 cast (tokens + secrets)
- H2: Validate target user exists before creating admin token to prevent
  orphan tokens on libSQL
- H3: Log DB errors in DbAuthenticator::authenticate() instead of
  silently swallowing them as 401

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert to rustls with webpki-roots fallback for PostgreSQL TLS

native-tls/OpenSSL caused silent crashes (segfaults in C code) during
DB writes on Railway containers. Switch back to rustls but add
webpki-roots as a fallback when system certs are missing, which was
the original TLS handshake failure on slim container images.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update Cargo.lock for rustls + webpki-roots

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* debug: add /api/debug/db-write endpoint to diagnose user insert failure

Temporary diagnostic endpoint that tests DB INSERT to users table
with full error logging. No auth required. Will be removed after
debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: use cargo-chef in Dockerfile for dependency caching

Splits the build into planner/deps/builder stages. Dependencies are
only recompiled when Cargo.toml or Cargo.lock change. Source-only
changes skip straight to the final build stage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* debug: add tracing to users_create_handler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: guard created_by FK in user creation handler

The auth identity user_id (from owner_id scope) may not match any
user row in the DB, causing a FK violation on the created_by column.
Check that the referenced user exists before setting created_by.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: collapse GATEWAY_USER_ID into IRONCLAW_OWNER_ID

Remove the separate GATEWAY_USER_ID config. The gateway now uses
IRONCLAW_OWNER_ID (config.owner_id) directly for auth identity,
bootstrap user creation, and workspace scoping.

Previously, with_owner_scope() rebinds the auth identity to owner_id
while keeping default_sender_id as the gateway user_id. This caused
a FK constraint violation when creating users because the auth
identity ("default") didn't match any user in the DB ("nearai").

Changes:
- Remove GATEWAY_USER_ID env var and gateway_user_id from settings
- Remove user_id field from GatewayConfig
- Add owner_id parameter to GatewayChannel::new()
- Remove with_owner_scope() method
- Remove default_sender_id from GatewayState
- Remove sender override logic in chat/approval handlers
- Remove debug endpoint and tracing from prior debugging
- Update all tests and E2E fixtures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: hide Users tab for non-admins, remove auth hint text

- Fetch /api/profile after login and hide the Users settings tab
  when the user's role is not admin
- Remove the "Enter the GATEWAY_AUTH_TOKEN" hint from the login page
  since tokens are now managed via the admin panel, not .env files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review feedback (auth 503, token expiry, CORS PATCH)

- DB auth errors now return 503 instead of 401 so outages are
  distinguishable from invalid tokens (serrrfirat H3)
- Cap expires_in_days to 36500 before i64 cast to prevent negative
  duration from u64 overflow (serrrfirat H1)
- Add PATCH to CORS allowed methods for profile/user update
  endpoints (Copilot)
- Stop leaking panic details in CatchPanicLayer response body

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden multi-tenant isolation — review fixes from nearai#1614

- Add conversation ownership checks in TenantScope: add_conversation_message,
  touch_conversation, list_conversation_messages (+ paginated),
  update_conversation_metadata_field, get_conversation_metadata now return
  NotFound for conversations not owned by the tenant (cross-tenant data leak)
- Fix multi-user heartbeat: clear notify_user_id per runner so notifications
  persist to the correct user, not the shared config target
- Move hygiene tasks into bounded JoinSet instead of unbounded tokio::spawn
- Revert send_notification to private visibility (only used within module)
- Use effective_model_name() for cost attribution in dispatcher so providers
  that ignore per-request model overrides report the actual model used
- Fix inject_model_override doc comment; add 3 unit tests
- Fix heartbeat doc comment ("routines" not "active routines")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Jobs, Cost, Last Active columns to admin Users table

Add UserSummaryStats struct and user_summary_stats() batch query to the
UserStore trait (both PostgreSQL and libSQL backends). The admin users
list endpoint now fetches per-user aggregates (job count, total LLM
spend, most recent activity) in a single query and includes them inline
in the response. The frontend Users table displays three new columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review comments and CI formatting failures

CI fixes:
- cargo fmt fixes in cli/mod.rs and db/tls.rs

Security/correctness (from Copilot + serrrfirat + pranavraja99 reviews):
- Token create: reject expires_in_days > 36500 with 400 instead of silent clamp
- Token create: return 404 when admin targets non-existent user
- User create: map duplicate email constraint violations to 409 Conflict
- User create: remove unnecessary DB roundtrip for created_by (use AdminUser directly)
- DB auth: log warn on DB lookup failures instead of silently swallowing errors
- libSQL: add FK constraints on users.created_by and api_tokens.user_id

Config fixes:
- agent.multi_tenant: resolve from AGENT_MULTI_TENANT env var instead of hardcoding false
- heartbeat.multi_tenant: fix doc comment to match actual env-var-based behavior

UI fix:
- showTokenBanner: pass correct title ("Token created!" vs "User created!")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review comments (round 2)

- Secrets handlers: normalize name to lowercase before store operations,
  validate target user_id exists (returns 404 if not found)
- libSQL: propagate cost parsing errors instead of unwrap_or_default()
  in both user_usage_stats and user_summary_stats
- users_list_handler: propagate user_summary_stats DB errors (was
  silently swallowed with unwrap_or_default)
- loadUsers: distinguish 401/403 (admin required) from other errors
- Docs: fix users.id type (TEXT not UUID), remove "invitation flow"
  from V14 migration comment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: i18n for Users tab, atomic user+token creation, transactional delete_user

i18n:
- Add 31 translation keys for all Users tab strings (en + zh-CN)
- Wire data-i18n attributes on HTML elements (headings, buttons, inputs,
  table headers, empty state)
- Replace all hard-coded strings in app.js with I18n.t() calls

Atomic user+token creation:
- Add create_user_with_token() to UserStore trait
- PostgreSQL: wraps both INSERTs in conn.transaction() with auto-rollback
- libSQL: wraps in explicit BEGIN/COMMIT with ROLLBACK on error
- Handler uses single atomic call instead of two separate operations

Transactional delete_user for libSQL:
- Wrap multi-table DELETE cascade in BEGIN/COMMIT transaction
- ROLLBACK on any error to prevent partial cleanup / inconsistent state
- Matches the PostgreSQL implementation which already used transactions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert V14 migration to match deployed checksum [skip-regression-check]

Refinery checksums applied migrations — editing V14__users.sql after
it was already applied causes deployment failures. Revert the cosmetic
comment changes (added in df40b22) to restore the original checksum.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: bootstrap onboarding flow for multi-tenant users

The bootstrap greeting and workspace seeding only ran for the owner
workspace at startup, so new users created via the admin API never
received the welcome message or identity files (BOOTSTRAP.md, SOUL.md,
AGENTS.md, USER.md, etc.).

Three fixes:
- tenant_ctx(): seed per-user workspace on first creation via
  seed_if_empty(), which writes identity files and sets
  bootstrap_pending when the workspace is truly fresh
- handle_message(): check take_bootstrap_pending() on the tenant
  workspace (not the owner workspace) and persist the greeting to
  the user's own assistant conversation + broadcast via SSE
- WorkspacePool: seed new per-user workspaces in the web gateway
  so memory tools also see identity files immediately

The existing single-user bootstrap in Agent::run() is preserved for
non-multi-tenant deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining PR review comments (round 3)

- Docs: fix metadata description from "merge patch" to "full replacement"
- Secrets: reject expires_in_days > 36500 with 400 (was silently clamped)
- libSQL: CAST(SUM(cost) AS TEXT) in user_usage_stats and user_summary_stats
  to prevent SQLite numeric coercion from crashing get_text() — this was
  the root cause of the Copilot "SUM returns numeric type" comments
- Add 3 regression tests: user_summary_stats (empty + with data) and
  user_usage_stats (multi-model aggregation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add role change support for users (admin/member toggle)

- Add update_user_role() to UserStore trait + both backends (PostgreSQL
  and libSQL)
- Extend PATCH /api/admin/users/{id} to accept optional "role" field
  with validation (must be "admin" or "member")
- Add "Make Admin" / "Make Member" toggle button in Users table actions
- Add i18n keys for role change (en + zh-CN)
- Update API docs to document the role field on PATCH
- Fix test helpers to use fmt_ts() for timestamps (was using SQLite
  datetime('now') which produces incompatible format for string comparison)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: show live LLM spend in Users table instead of only DB-recorded costs [skip-regression-check]

Chat turns record LLM cost in CostGuard (in-memory) but don't create
agent_jobs/llm_calls DB rows — those are only written for background
jobs. The Users table was querying only from DB, so it showed $0.00
for users who only chatted.

Now supplements DB stats with CostGuard.daily_spend_for_user() —
the same source displayed in the status bar token counter. Shows
whichever is larger (DB historical total vs live daily spend).

Also falls back to last_login_at for "Last Active" when no DB job
activity exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: persist chat LLM calls to DB and fix usage stats query

Two root causes for zero usage stats:

1. ChatDelegate only recorded LLM costs to CostGuard (in-memory) —
   never to the llm_calls DB table. Added DB persistence via
   TenantScope.record_llm_call() after each chat LLM call, with
   job_id=NULL and conversation_id=thread_id.

2. user_summary_stats query only joined agent_jobs→llm_calls, missing
   chat calls (which have job_id=NULL). Redesigned query to start from
   llm_calls and resolve user_id via COALESCE(agent_jobs.user_id,
   conversations.user_id) — covers both job and chat LLM calls.

Both PostgreSQL and libSQL queries updated. TenantScope gets
record_llm_call() method. Tests updated for new query semantics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review comments — input validation, cost semantics, panic safety [skip-regression-check]

- Validate display_name: trim whitespace, reject empty strings (create + update)
- Validate metadata: must be a JSON object, return 400 if not (admin + profile)
- secrets_list_handler: verify target user_id exists before listing
- Cost display: use DB total directly (chat calls now persist to DB),
  remove confusing max(db,live) CostGuard fallback
- CatchPanicLayer: truncate panic payload to 200 chars in log to limit
  potential sensitive data exposure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Copilot round 5 — docs, secrets consistency, token name, provider field [skip-regression-check]

- Docs: users.id note updated to "typically UUID v4 strings (bootstrap
  admin may use a custom ID)"
- secrets_list_handler: return 503 when DB store is None (was falling
  through to list secrets without user validation)
- tokens_create: trim + reject empty token name (matching display_name
  pattern)
- LlmCallRecord.provider: use llm_backend ("nearai","openai") instead
  of model_name() which returns the model identifier
- user_summary_stats zero-LLM users: acceptable — handler already falls
  back to 0 cost and last_login_at for missing entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: DB auth returns 503 on outage, scheduler counts only blocking jobs

From serrrfirat review:
- DB auth: return Err(()) on database errors so middleware returns 503
  instead of silently returning Ok(None) → 401 (auth miss)
- Scheduler: add parallel_blocking_count_for() that uses
  is_parallel_blocking() (Pending/InProgress/Stuck) instead of
  is_active() for per-user concurrency — Completed/Submitted jobs
  no longer count against MAX_JOBS_PER_USER

From Copilot:
- CLAUDE.md: fix secrets route paths from {id} to {user_id}
- token_hash: use .as_slice() instead of .to_vec() to avoid
  heap allocation on every token auth/creation call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: immediate auth cache invalidation on security-critical actions (zmanian review nearai#6)

Add DbAuthenticator::invalidate_user() that evicts all cached entries
for a user. Called after:
- Suspend user (immediate lockout, was 60s delay)
- Activate user (immediate access restoration)
- Role change (admin↔member takes effect immediately)
- Token revocation (revoked token can't be reused from cache)

The DbAuthenticator is shared (via Clone, which Arc-clones the cache)
between the auth middleware and GatewayState, so handlers can evict
entries from the same cache the middleware reads.

Also from zmanian's review:
- Items 1-5, 7-11 were already resolved in prior commits
- Item 12 (String→enum for status/role) is deferred as a broader refactor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: last-admin protection, usage stats for chat calls, UTF-8 safe panic truncation

Last-admin protection:
- Suspend, delete, and role-demotion of the last active admin now
  return 409 Conflict instead of succeeding and locking out the admin API
- Helper is_last_admin() checks active admin count before destructive ops

Usage stats:
- user_usage_stats() now includes chat LLM calls (job_id=NULL) by
  joining via conversations.user_id, matching user_summary_stats()
- Both PostgreSQL and libSQL queries updated

Panic handler:
- Use floor_char_boundary(200) instead of byte-index [..200] to
  prevent panic on multi-byte UTF-8 characters in panic messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: workspace seed race, bootstrap atomicity, email trim, secrets upsert response [skip-regression-check]

- WorkspacePool: await seed_if_empty() synchronously after inserting
  into cache (drop lock first to avoid blocking), so callers see
  identity files immediately instead of racing a background task
- Bootstrap admin: use create_user_with_token() for atomic user+token
  creation, matching the admin create endpoint
- Email: trim whitespace, treat empty as None to prevent " " being
  stored and breaking uniqueness
- Secrets PUT: report "updated" vs "created" based on prior existence
- Last token_hash.to_vec() → .as_slice() in authenticate_token

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: disable unscoped webhook endpoint in multi-tenant mode [skip-regression-check]

The original /api/webhooks/{path} endpoint looks up routines across all
users. In multi-tenant mode, anyone who knows the webhook path + secret
could trigger another user's routine. Now returns 410 Gone with a
message pointing to the scoped endpoint /api/webhooks/u/{user_id}/{path}.

Detection uses state.db_auth.is_some() — present only when DB-backed
auth is enabled (multi-tenant). Single-user deployments are unaffected.

From: standardtoaster review comment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: webhook multi-tenant check, secrets error propagation, stale doc comment [skip-regression-check]

- Webhook: use workspace_pool.is_some() instead of db_auth.is_some()
  for multi-tenant detection — db_auth is set for any DB deployment,
  workspace_pool is only set when has_any_users() was true at startup
- Secrets: propagate exists() errors instead of unwrap_or(false) so
  backend outages surface as 500 rather than incorrect "created" status
- Config: fix stale workspace_read_scopes comment referencing user_id

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon pushed a commit that referenced this pull request Mar 30, 2026
This commit addresses all security concerns raised in PR review:

1. Revert JobContext::default() to approval_context: None
   - Previously set ApprovalContext::autonomous() which was too permissive
   - Secure default requires explicit opt-in for autonomous execution
   - Any code using JobContext::default() now correctly blocks non-Never tools

2. Fix check_approval_in_context() to match worker behavior
   - Previously returned Ok(()) when approval_context was None (insecure)
   - Now uses ApprovalContext::is_blocked_or_default() for consistency
   - Prevents privilege escalation through sub-tool execution paths

3. Remove "http" from builder's allowed tools
   - Building software doesn't require direct http tool access
   - Shell commands (cargo, npm, pip) handle dependency fetching
   - Reduces attack surface for builder tool execution

4. Update tests to reflect new secure defaults
   - Tests now verify JobContext::default() blocks non-Never tools
   - New test added for secure default behavior

Security review references:
- Issue #1: JobContext::default() behavioral change
- Issue #3: check_approval_in_context more permissive than worker check
- Issue #4: Builder allows http without justification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon pushed a commit that referenced this pull request Mar 30, 2026
…hecks

This addresses the remaining security review concern from PR #1125.

Previously, the worker used "precedence" semantics where job-level approval
context would completely bypass worker-level checks. This meant a tool's
job-level context could potentially override worker-level restrictions.

Changes:
- Worker now checks BOTH job-level AND worker-level approval contexts
- Tool is blocked if EITHER level blocks it (additive/intersection semantics)
- Maintains defense in depth: job-level cannot bypass worker-level restrictions

Tests added:
- test_additive_approval_semantics_both_levels_must_approve: verifies job-level
  blocks take effect even when worker-level allows
- test_additive_approval_worker_block_overrides_job_allow: verifies worker-level
  blocks take effect even when job-level allows
- test_additive_approval_both_levels_allow: verifies tool is allowed only when
  both levels approve

Security review reference:
- Issue #3 from @G7CNF: "document or enforce additive semantics for job + worker
  approval checks"
- Issue #2 from @zmanian: "Job-level context bypasses worker-level entirely"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian added a commit that referenced this pull request Mar 31, 2026
…ction logic

Replace the weak regression test that only validated HashMap semantics
with tests that exercise the actual combined auto-approve + state-transition
pattern from process_approval(). Extract the single-lock logic into a helper
function mirroring lines 1035-1064, and add three focused tests:

- thread disappearance triggers rollback of auto-approve
- present thread keeps auto-approve and transitions to Processing
- always=false never adds to auto-approved set

Addresses review feedback from ilblackdragon on PR #1591 (must-fix #3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Mar 31, 2026
…#1125)

* feat(context): add approval_context field to JobContext

Add approval_context to JobContext so tools can propagate approval
information when executing sub-tools. This enables tools like
build_software to properly check approvals for shell, write_file, etc.

- Add approval_context: Option<ApprovalContext> field to JobContext
- Add with_approval_context() builder method
- Add check_approval_in_context() helper for tools to verify permissions
- Default JobContext now includes autonomous approval context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(worker): check job-level approval context before executing tools

Move job context fetch before approval check and add job-level
approval context checking. Job-level context takes precedence over
worker-level, allowing tools like build_software to set specific
allowed sub-tools while maintaining the fallback to worker-level
approval for normal operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(scheduler): propagate approval_context to JobContext

Store approval_context from dispatch into JobContext so it's
available to tools during execution. This completes the chain:
scheduler -> job context -> tools -> sub-tools.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(builder): use approval context for sub-tool execution

Update build_software to create a JobContext with build-specific
approval permissions and check approval before executing sub-tools.
This allows the builder to work in autonomous contexts (web UI, routines)
while maintaining security by only allowing specific build-related tools.

Allowed tools: shell, read_file, write_file, list_dir, apply_patch, http

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(db): initialize approval_context as None in job restoration

When restoring jobs from database, set approval_context to None.
The context will be populated by the scheduler on next dispatch if needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add comprehensive approval context tests

Add tests for:
- JobContext default includes approval_context
- with_approval_context() builder method
- Autonomous context blocks Always-approved tools unless explicitly allowed
- autonomous_with_tools allows specific tools
- Builder tool approval context configuration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address critical approval context security issues

This commit addresses all security concerns raised in PR review:

1. Revert JobContext::default() to approval_context: None
   - Previously set ApprovalContext::autonomous() which was too permissive
   - Secure default requires explicit opt-in for autonomous execution
   - Any code using JobContext::default() now correctly blocks non-Never tools

2. Fix check_approval_in_context() to match worker behavior
   - Previously returned Ok(()) when approval_context was None (insecure)
   - Now uses ApprovalContext::is_blocked_or_default() for consistency
   - Prevents privilege escalation through sub-tool execution paths

3. Remove "http" from builder's allowed tools
   - Building software doesn't require direct http tool access
   - Shell commands (cargo, npm, pip) handle dependency fetching
   - Reduces attack surface for builder tool execution

4. Update tests to reflect new secure defaults
   - Tests now verify JobContext::default() blocks non-Never tools
   - New test added for secure default behavior

Security review references:
- Issue #1: JobContext::default() behavioral change
- Issue #3: check_approval_in_context more permissive than worker check
- Issue #4: Builder allows http without justification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(worker): implement additive approval semantics for job + worker checks

This addresses the remaining security review concern from PR #1125.

Previously, the worker used "precedence" semantics where job-level approval
context would completely bypass worker-level checks. This meant a tool's
job-level context could potentially override worker-level restrictions.

Changes:
- Worker now checks BOTH job-level AND worker-level approval contexts
- Tool is blocked if EITHER level blocks it (additive/intersection semantics)
- Maintains defense in depth: job-level cannot bypass worker-level restrictions

Tests added:
- test_additive_approval_semantics_both_levels_must_approve: verifies job-level
  blocks take effect even when worker-level allows
- test_additive_approval_worker_block_overrides_job_allow: verifies worker-level
  blocks take effect even when job-level allows
- test_additive_approval_both_levels_allow: verifies tool is allowed only when
  both levels approve

Security review reference:
- Issue #3 from @G7CNF: "document or enforce additive semantics for job + worker
  approval checks"
- Issue #2 from @zmanian: "Job-level context bypasses worker-level entirely"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address PR #1125 review feedback

- Restore requirement-aware is_blocked() semantics: Never and
  UnlessAutoApproved tools pass in autonomous context, Only Always
  tools require explicit allowlist entry
- Use AutonomousUnavailable error (with descriptive reason) instead
  of generic AuthRequired for approval blocking in worker
- Deduplicate approval_context propagation in scheduler dispatch
  (single update_context_and_get call instead of duplicated blocks)
- Remove http from builder tool allowlist (shell handles network)
- Add TODO comments for serde(skip) losing approval_context on DB
  restore in both libsql and postgres backends
- Add tests: Never tools in additive model, builder unlisted tool
  blocking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(worker): remove duplicate approval check and use normalized params

- Remove pre-existing worker-level-only approval check (lines 561-567)
  that duplicated the new additive check, using a different error type
  and missing job-level context
- Use normalized_params (not raw params) for requires_approval() so
  parameter-dependent approval (e.g. shell destructive detection) works
  correctly with coerced values
- Remove unused autonomous_unavailable_error import
- Add comment documenting unreachable else branch in scheduler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ilblackdragon@gmail.com <ilblackdragon@gmail.com>
JZKK720 pushed a commit to JZKK720/ironclaw that referenced this pull request Apr 1, 2026
…nearai#1125)

* feat(context): add approval_context field to JobContext

Add approval_context to JobContext so tools can propagate approval
information when executing sub-tools. This enables tools like
build_software to properly check approvals for shell, write_file, etc.

- Add approval_context: Option<ApprovalContext> field to JobContext
- Add with_approval_context() builder method
- Add check_approval_in_context() helper for tools to verify permissions
- Default JobContext now includes autonomous approval context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(worker): check job-level approval context before executing tools

Move job context fetch before approval check and add job-level
approval context checking. Job-level context takes precedence over
worker-level, allowing tools like build_software to set specific
allowed sub-tools while maintaining the fallback to worker-level
approval for normal operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(scheduler): propagate approval_context to JobContext

Store approval_context from dispatch into JobContext so it's
available to tools during execution. This completes the chain:
scheduler -> job context -> tools -> sub-tools.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(builder): use approval context for sub-tool execution

Update build_software to create a JobContext with build-specific
approval permissions and check approval before executing sub-tools.
This allows the builder to work in autonomous contexts (web UI, routines)
while maintaining security by only allowing specific build-related tools.

Allowed tools: shell, read_file, write_file, list_dir, apply_patch, http

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(db): initialize approval_context as None in job restoration

When restoring jobs from database, set approval_context to None.
The context will be populated by the scheduler on next dispatch if needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add comprehensive approval context tests

Add tests for:
- JobContext default includes approval_context
- with_approval_context() builder method
- Autonomous context blocks Always-approved tools unless explicitly allowed
- autonomous_with_tools allows specific tools
- Builder tool approval context configuration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address critical approval context security issues

This commit addresses all security concerns raised in PR review:

1. Revert JobContext::default() to approval_context: None
   - Previously set ApprovalContext::autonomous() which was too permissive
   - Secure default requires explicit opt-in for autonomous execution
   - Any code using JobContext::default() now correctly blocks non-Never tools

2. Fix check_approval_in_context() to match worker behavior
   - Previously returned Ok(()) when approval_context was None (insecure)
   - Now uses ApprovalContext::is_blocked_or_default() for consistency
   - Prevents privilege escalation through sub-tool execution paths

3. Remove "http" from builder's allowed tools
   - Building software doesn't require direct http tool access
   - Shell commands (cargo, npm, pip) handle dependency fetching
   - Reduces attack surface for builder tool execution

4. Update tests to reflect new secure defaults
   - Tests now verify JobContext::default() blocks non-Never tools
   - New test added for secure default behavior

Security review references:
- Issue nearai#1: JobContext::default() behavioral change
- Issue nearai#3: check_approval_in_context more permissive than worker check
- Issue nearai#4: Builder allows http without justification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(worker): implement additive approval semantics for job + worker checks

This addresses the remaining security review concern from PR nearai#1125.

Previously, the worker used "precedence" semantics where job-level approval
context would completely bypass worker-level checks. This meant a tool's
job-level context could potentially override worker-level restrictions.

Changes:
- Worker now checks BOTH job-level AND worker-level approval contexts
- Tool is blocked if EITHER level blocks it (additive/intersection semantics)
- Maintains defense in depth: job-level cannot bypass worker-level restrictions

Tests added:
- test_additive_approval_semantics_both_levels_must_approve: verifies job-level
  blocks take effect even when worker-level allows
- test_additive_approval_worker_block_overrides_job_allow: verifies worker-level
  blocks take effect even when job-level allows
- test_additive_approval_both_levels_allow: verifies tool is allowed only when
  both levels approve

Security review reference:
- Issue nearai#3 from @G7CNF: "document or enforce additive semantics for job + worker
  approval checks"
- Issue nearai#2 from @zmanian: "Job-level context bypasses worker-level entirely"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address PR nearai#1125 review feedback

- Restore requirement-aware is_blocked() semantics: Never and
  UnlessAutoApproved tools pass in autonomous context, Only Always
  tools require explicit allowlist entry
- Use AutonomousUnavailable error (with descriptive reason) instead
  of generic AuthRequired for approval blocking in worker
- Deduplicate approval_context propagation in scheduler dispatch
  (single update_context_and_get call instead of duplicated blocks)
- Remove http from builder tool allowlist (shell handles network)
- Add TODO comments for serde(skip) losing approval_context on DB
  restore in both libsql and postgres backends
- Add tests: Never tools in additive model, builder unlisted tool
  blocking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(worker): remove duplicate approval check and use normalized params

- Remove pre-existing worker-level-only approval check (lines 561-567)
  that duplicated the new additive check, using a different error type
  and missing job-level context
- Use normalized_params (not raw params) for requires_approval() so
  parameter-dependent approval (e.g. shell destructive detection) works
  correctly with coerced values
- Remove unused autonomous_unavailable_error import
- Add comment documenting unreachable else branch in scheduler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ilblackdragon@gmail.com <ilblackdragon@gmail.com>
henrypark133 added a commit that referenced this pull request Apr 2, 2026
…, runtime assert in Signal, remove default fallback, warn on noop pairing codes

Addresses zmanian's review:
- #1: pairing_list_handler requires AuthenticatedUser
- #2: OwnershipCache.evict_user() evicts all entries for a user on suspension
- #3: debug_assert! for multi-thread runtime in Signal block_in_place
- #9: Noop PairingStore warns when generating unredeemable codes
- #10: cli/mcp.rs default fallback replaced with <unset>
zmanian added a commit that referenced this pull request Apr 3, 2026
…ction logic

Replace the weak regression test that only validated HashMap semantics
with tests that exercise the actual combined auto-approve + state-transition
pattern from process_approval(). Extract the single-lock logic into a helper
function mirroring lines 1035-1064, and add three focused tests:

- thread disappearance triggers rollback of auto-approve
- present thread keeps auto-approve and transitions to Processing
- always=false never adds to auto-approved set

Addresses review feedback from ilblackdragon on PR #1591 (must-fix #3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
henrypark133 added a commit that referenced this pull request Apr 4, 2026
…B-backed pairing, and OwnershipCache (#1898)

* feat(ownership): add OwnerId, Identity, UserRole, can_act_on types

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): private OwnerId field, ResourceScope serde derives, fix doc comment

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* refactor(tenant): replace SystemScope::db() escape hatch with typed workspace_for_user(), fix stale variable names

- Add SystemScope::workspace_for_user() that wraps Workspace::new_with_db
- Remove SystemScope::db() which exposed the raw Arc<dyn Database>
- Update 3 callers (routine_engine.rs x2, heartbeat.rs x1) to use the new method
- Fix stale comment: "admin context" -> "system context" in SystemScope
- Rename `admin` bindings to `system` in agent_loop.rs for clarity

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(tenant): rename stale admin binding to system_store in heartbeat.rs

* refactor(tenant): TenantScope/TenantCtx carry Identity, add with_identity() constructor and bridge new()

- TenantScope: replace `user_id: String` field with `identity: Identity`; add `with_identity()` preferred constructor; keep `new(user_id, db)` as Member-role bridge; add `identity()` accessor; all internal method bodies use `identity.owner_id.as_str()` in place of `&self.user_id`
- TenantCtx: replace `user_id: String` field with `identity: Identity`; update constructor signature; add `identity()` accessor; `user_id()` delegates to `identity.owner_id.as_str()`; cost/rate methods updated accordingly
- agent_loop: split `tenant_ctx(&str)` into bridge + new `tenant_ctx_with_identity(Identity)` which holds the full body; bridge delegates to avoid duplication

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(db): add V16 tool scope, V17 channel_identities, V18 pairing_requests migrations

- PostgreSQL: V16__tool_scope.sql adds scope column to wasm_tools/dynamic_tools
- PostgreSQL: V17__channel_identities.sql creates channel identity resolution table
- PostgreSQL: V18__pairing_requests.sql creates pairing request table replacing file-based store
- libSQL SCHEMA: adds scope column to wasm_tools/dynamic_tools, channel_identities, pairing_requests tables
- libSQL INCREMENTAL_MIGRATIONS: versions 17-19 for existing databases
- IDEMPOTENT_ADD_COLUMN_MIGRATIONS: handles fresh-install/upgrade dual path for scope columns
- Runner updated to check ALL idempotent columns per version before skipping SQL
- Test: test_ownership_model_tables_created verifies all new tables/columns exist after migrations

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(db): use correct RFC3339 timestamp default in libSQL, document version sequence offset

Replace datetime('now') with strftime('%Y-%m-%dT%H:%M:%fZ', 'now') in the
channel_identities and pairing_requests table definitions (both in SCHEMA and
INCREMENTAL_MIGRATIONS) to match the project-standard RFC 3339 timestamp format
with millisecond precision. Also add a comment clarifying that libSQL incremental
migration version numbers are independent from PostgreSQL VN migration numbers.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(ownership): bootstrap_ownership(), migrate_default_owner, V19 FK migration, replace hardcoded 'default' user IDs

- Add V19__ownership_fk.sql (programmatic-only, not in auto-migration sweep)
- Add `migrate_default_owner` to Database trait + both PgBackend and LibSqlBackend
- Add `get_or_create_user` default method to UserStore trait
- Add `bootstrap_ownership()` to app.rs, called in init_database() after connect_with_handles
- Replace hardcoded "default" owner_id in cli/config.rs, cli/mcp.rs, cli/mod.rs, orchestrator/mod.rs
- Add TODO(ownership) comments in llm/session.rs and tools/mcp/client.rs for deferred constructors

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): atomic get_or_create_user, transactional migrate_default_owner, V19 FK inline constant, fix remaining 'default' user IDs

- Delete migrations/V19__ownership_fk.sql so refinery no longer auto-applies FK constraints before bootstrap_ownership runs; add OWNERSHIP_FK_SQL constant with TODO for future programmatic application
- Remove racy SELECT+INSERT default in UserStore::get_or_create_user; both PostgreSQL (ON CONFLICT DO NOTHING) and libSQL (INSERT OR IGNORE) now use atomic upserts
- Wrap migrate_default_owner in explicit transactions on both backends for atomicity
- Make bootstrap_ownership failure fatal (propagate error instead of warn-and-continue)
- Fix mcp auth/test --user: change from default_value="default" to Option<String> resolved from configured owner_id
- Replace hardcoded "default" user IDs in channels/wasm/setup.rs with config.owner_id
- Replace "default" sentinel in OrchestratorState test helper with "<unset>" to make the test-only nature explicit

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): remove default user_id from create_job(), change sentinel strings to <unset>

- Gate ContextManager::create_job() behind #[cfg(test)]; production code must
  use create_job_for_user() with an explicit user_id to prevent DB rows with
  user_id = 'default' being silently created on the production write path.
- Change the placeholder user_id in McpClient::new(), new_with_name(), and
  new_with_config() from "default" to "<unset>" so accidental secrets/settings
  lookups surface immediately rather than silently touching the wrong DB partition.
- Same sentinel change for SessionManager::new() and new_async() in session.rs;
  these are overwritten by attach_store() at startup with the real owner_id.
- Update tests that asserted the old "default" sentinel to expect "<unset>", and
  switch test_list_jobs_tool / test_job_status_tool to create_job_for_user("default")
  to keep ownership alignment with JobContext::default().

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(db): add ChannelPairingStore sub-trait with resolve_channel_identity, upsert/approve pairing, PostgreSQL + libSQL implementations

Adds PairingRequestRecord, ChannelPairingStore trait (5 methods), and
generate_pairing_code() to src/db/mod.rs; implements for PgBackend in
postgres.rs and LibSqlBackend in libsql/pairing.rs; wires ChannelPairingStore
into the Database supertrait bound; all 6 libSQL unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(db): atomic libSQL approve_pairing with BEGIN IMMEDIATE, add case-insensitive/expired/double-approve tests

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(ownership): add OwnershipCache for zero-DB-read identity resolution on warm path

Converts src/ownership.rs to src/ownership/ module directory and adds
src/ownership/cache.rs with a write-through in-process cache mapping
(channel, external_id) -> Identity. Wired as Arc<OwnershipCache> on
AppComponents for Task 8 pairing integration. All 7 cache unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(e2e): add ownership model E2E tests and extend pairing tests for DB-backed store

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): remove unused asyncio import, add fallback assertion in test_pairing_response_structure

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(tenant): unit tests for TenantScope::with_identity and AdminScope construction

Adds 5 focused unit tests verifying TenantScope::with_identity stores the
full Identity (owner_id + role), TenantScope::new creates a Member-role
identity, and AdminScope::new returns Some for Admin and None for Member.
Uses LibSqlBackend::new_memory() as the test DB stub.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): recover from RwLock poison instead of expect() in OwnershipCache

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(ownership): integration tests for bootstrap, tenant isolation, and ChannelPairingStore

Adds tests/ownership_integration.rs covering migrate_default_owner idempotency,
TenantScope per-user setting isolation (including Admin role bypass check),
and the full ChannelPairingStore lifecycle (upsert, approve, remove, multi-channel isolation).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate pairing tests and flaky random-code assertion from integration suite

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(pairing): rewrite PairingStore to DB-backed async with OwnershipCache

Replaces the file-based pairing store (~/.ironclaw/*-pairing.json,
*-allowFrom.json) with a DB-backed async implementation that delegates
to ChannelPairingStore and writes through to OwnershipCache on reads.

- PairingStore::new(db, cache) uses the DB; new_noop() for test/no-DB
- resolve_identity() cache-first lookup via OwnershipCache
- approve(code, owner_id) removes channel arg (DB looks up by code)
- All WASM host functions updated: pairing_upsert_request uses block_in_place,
  pairing-is-allowed renamed to pairing-resolve-identity returning Option<String>,
  pairing-read-allow-from deprecated (returns empty list)
- Signal channel receives PairingStore via new(config, db) constructor
- Web gateway pairing handlers read from state.store (DB) directly
- extensions.rs derive_activation_status drops PairingStore dependency;
  derives status from extension.active and owner_binding flag instead
- All test call sites updated to use new_noop()

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pairing): add missing pairing_store field to all GatewayState initializers, fix disk-full post-edit compile

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(channels): remove owner_id from IncomingMessage, user_id is the canonical resolved OwnerId

`owner_id` on `IncomingMessage` was always a duplicate of `user_id` —
both fields held the same value at every call site. Remove the field and
`with_owner_id()` builder, update the four WASM-wrapper and HTTP test
assertions to use `user_id`, and drop the redundant struct literal field
in the routine_engine test helper.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(channels): remove stale owner_id param from make_message test helper

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(e2e): add browser/Playwright tests for ownership model — auth screen, chat UI, owner login

Adds five Playwright-based browser tests to the ownership model E2E suite
verifying the web UI experience: authenticated owner sees chat input, unauthenticated
browser sees auth screen, owner can send a message and receive a response, settings
tab renders without errors, and basic page structure is correct after login.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(settings): migrate channel credentials from plaintext settings to encrypted secrets store

Moves nearai.session_token from the plaintext DB settings table to the
AES-256-GCM encrypted secrets store (key: nearai_session_token).

- SessionManager gains an `attach_secrets()` method that wires in the
  secrets store; `save_session` writes to it when available and
  `load_session_from_secrets` is called preferentially over settings
- `migrate_session_credential()` runs idempotently on each startup in
  `init_secrets()`, reading the JSON session from settings, writing it
  to secrets, then deleting the plaintext copy
- Wizard's `persist_session_to_db` now writes to secrets first, falling
  back to plaintext settings only when secrets store is unavailable
- Plaintext settings path is preserved as fallback for installs without
  a secrets store (no master key configured)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(settings): settings fallback only when no secrets store, verify decryption before deleting plaintext

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): ROLLBACK in libSQL migrate_default_owner, shared OwnershipCache across channels, add dynamic_tools to migration, fix doc comment

- libSQL migrate_default_owner: wrap UPDATE loop in async closure + match to emit ROLLBACK on any mid-transaction failure (mirroring approve_pairing pattern)
- Both backends: add dynamic_tools to the migrate_default_owner table list so agent-built tools are migrated on first pairing
- setup_wasm_channels: accept Arc<OwnershipCache> parameter instead of allocating a fresh cache, share the AppComponents cache
- SignalChannel::new: accept Arc<OwnershipCache> parameter and pass it to PairingStore instead of allocating a new cache
- PairingStore: fix module-level and struct-level doc comments to accurately describe lazy cache population after approve()

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(web): use can_act_on for authorization in job/routine handlers instead of raw string comparisons

Replace 12 raw `user_id != user.user_id` / `user_id == user.user_id` string comparisons
in jobs.rs and 4 in routines.rs with calls through the canonical `can_act_on` function
from `crate::ownership`, which is the spec-mandated authorization mechanism.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* chore: include remaining modified files in ownership model branch

* fix: add pairing_store field to test GatewayState initializers, update PairingStore API calls in integration tests

Add missing `pairing_store: None` to all GatewayState struct initializers
in test files. Migrate old file-based PairingStore API calls
(PairingStore::new(), PairingStore::with_base_dir()) to the new DB-backed
API (PairingStore::new_noop()). Rewrite pairing_integration.rs to use
LibSqlBackend with the new async DB-backed PairingStore API.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* chore: cargo fmt

* fix(pairing): truly no-op PairingStore noop mode, ensure owner user in CLI, fix signal safety comments

- PairingStore::upsert_request now returns a dummy record in noop mode instead of
  erroring, and approve silently succeeds (matching the doc promise of "writes
  are silently discarded").
- PairingStore::approve now accepts a channel parameter, matching the updated
  DB trait signature and propagated to all call sites (CLI, web server, tests).
- CLI run_pairing_command ensures the owner user row exists before approval to
  satisfy the FK constraint on channel_identities.owner_id.
- Signal channel block_in_place safety comments corrected from "WASM channel
  callbacks" to "Signal channel message processing".

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pairing): thread channel through approve_pairing, add created flag, retry on code collision, remove redundant indexes

Addresses PR review comments:
- approve_pairing validates code belongs to the given channel
- PairingRequestRecord.created replaces timing heuristic
- upsert retries on UNIQUE violation (up to 3 attempts)
- redundant indexes removed (UNIQUE creates implicit index)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): migrate api_tokens, serialize PG approvals, propagate resolved owner_id

Addresses PR review P1/P2 regressions:

- api_tokens included in migrate_default_owner (both backends)
- PostgreSQL approve_pairing uses FOR UPDATE to prevent concurrent approvals
- Signal resolve_sender_identity returns owner_id, set as IncomingMessage.user_id
  with raw phone number preserved as sender_id for reply routing
- Feishu uses resolved owner_id from pairing_resolve_identity in emitted message
- PairingStore noop mode logs warning when pairing admission is impossible

[skip-regression-check]

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-review): sanitize DB errors in pairing handlers, fix doc comments, add TODO for derive_activation_status

- Pairing list/approve handlers no longer leak DB error details to clients
- NotFound errors return user-friendly 'Invalid or expired pairing code' message
- Module doc in pairing/store.rs corrected (remove -> evict, no insert method)
- wit_compat.rs stub comment corrected to match actual Val shape
- TODO added for derive_activation_status has_paired approximation

* fix(pr-review): propagate libSQL query errors in approve_pairing, round-trip validate session credential migration, fix test doc comment

- libSQL approve_pairing: .ok().flatten() replaced with .map_err() to propagate DB errors
- migrate_session_credential: round-trip compares decrypted secret against plaintext before deleting
- ownership_integration.rs: doc comment corrected to match actual test coverage

* fix(pairing): store meta, wrap upserts in transactions, case-insensitive role/channel, log Signal DB errors, use auth role in handlers

- Store meta JSONB/TEXT column in pairing_requests (PG migration V18, libSQL schema + incremental migration 19)
- Wrap upsert_pairing_request in transactions (PG: client.transaction(), libSQL: BEGIN IMMEDIATE/COMMIT/ROLLBACK)
- Case-insensitive role parsing: eq_ignore_ascii_case("admin") in both backends
- Case-insensitive channel matching in approve_pairing: LOWER(channel) = LOWER($2)
- Log DB errors in Signal resolve_sender_identity instead of silently discarding
- Use auth role from UserIdentity in web handlers (jobs.rs, routines.rs) via identity_from_auth helper
- Fix variable shadowing: rename `let channel` to `let req_channel` in libsql approve_pairing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): add auth to pairing list, cache eviction on deactivate, runtime assert in Signal, remove default fallback, warn on noop pairing codes

Addresses zmanian's review:
- #1: pairing_list_handler requires AuthenticatedUser
- #2: OwnershipCache.evict_user() evicts all entries for a user on suspension
- #3: debug_assert! for multi-thread runtime in Signal block_in_place
- #9: Noop PairingStore warns when generating unredeemable codes
- #10: cli/mcp.rs default fallback replaced with <unset>

* fix(pairing): consistent LOWER() channel matching in resolve_channel_identity, fix wizard doc comment, fix E2E test assertion for ActionResponse convention

* fix(pairing): apply LOWER() consistently across all ChannelPairingStore queries (upsert, list_pending, remove)

All channel matching now uses LOWER() in both PostgreSQL and libSQL backends:
- upsert_pairing_request: WHERE LOWER(channel) = LOWER($1)
- list_pending_pairings: WHERE LOWER(channel) = LOWER($1)
- remove_channel_identity: WHERE LOWER(channel) = LOWER($1)

Previously only resolve_channel_identity and approve_pairing used LOWER(),
causing inconsistent matching when channel names differed by case.

* fix(pairing): unify code challenge flow and harden web pairing

* test: harden pairing review follow-ups

* fix: guard wasm pairing callbacks by runtime flavor

* fix(pairing): normalize channel keys and serialize pg upserts

* chore(web): clean up ownership review follow-ups

* Preserve WASM pairing allowlist compatibility

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
serrrfirat pushed a commit that referenced this pull request Apr 5, 2026
…#1125)

* feat(context): add approval_context field to JobContext

Add approval_context to JobContext so tools can propagate approval
information when executing sub-tools. This enables tools like
build_software to properly check approvals for shell, write_file, etc.

- Add approval_context: Option<ApprovalContext> field to JobContext
- Add with_approval_context() builder method
- Add check_approval_in_context() helper for tools to verify permissions
- Default JobContext now includes autonomous approval context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(worker): check job-level approval context before executing tools

Move job context fetch before approval check and add job-level
approval context checking. Job-level context takes precedence over
worker-level, allowing tools like build_software to set specific
allowed sub-tools while maintaining the fallback to worker-level
approval for normal operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(scheduler): propagate approval_context to JobContext

Store approval_context from dispatch into JobContext so it's
available to tools during execution. This completes the chain:
scheduler -> job context -> tools -> sub-tools.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(builder): use approval context for sub-tool execution

Update build_software to create a JobContext with build-specific
approval permissions and check approval before executing sub-tools.
This allows the builder to work in autonomous contexts (web UI, routines)
while maintaining security by only allowing specific build-related tools.

Allowed tools: shell, read_file, write_file, list_dir, apply_patch, http

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(db): initialize approval_context as None in job restoration

When restoring jobs from database, set approval_context to None.
The context will be populated by the scheduler on next dispatch if needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add comprehensive approval context tests

Add tests for:
- JobContext default includes approval_context
- with_approval_context() builder method
- Autonomous context blocks Always-approved tools unless explicitly allowed
- autonomous_with_tools allows specific tools
- Builder tool approval context configuration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address critical approval context security issues

This commit addresses all security concerns raised in PR review:

1. Revert JobContext::default() to approval_context: None
   - Previously set ApprovalContext::autonomous() which was too permissive
   - Secure default requires explicit opt-in for autonomous execution
   - Any code using JobContext::default() now correctly blocks non-Never tools

2. Fix check_approval_in_context() to match worker behavior
   - Previously returned Ok(()) when approval_context was None (insecure)
   - Now uses ApprovalContext::is_blocked_or_default() for consistency
   - Prevents privilege escalation through sub-tool execution paths

3. Remove "http" from builder's allowed tools
   - Building software doesn't require direct http tool access
   - Shell commands (cargo, npm, pip) handle dependency fetching
   - Reduces attack surface for builder tool execution

4. Update tests to reflect new secure defaults
   - Tests now verify JobContext::default() blocks non-Never tools
   - New test added for secure default behavior

Security review references:
- Issue #1: JobContext::default() behavioral change
- Issue #3: check_approval_in_context more permissive than worker check
- Issue #4: Builder allows http without justification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(worker): implement additive approval semantics for job + worker checks

This addresses the remaining security review concern from PR #1125.

Previously, the worker used "precedence" semantics where job-level approval
context would completely bypass worker-level checks. This meant a tool's
job-level context could potentially override worker-level restrictions.

Changes:
- Worker now checks BOTH job-level AND worker-level approval contexts
- Tool is blocked if EITHER level blocks it (additive/intersection semantics)
- Maintains defense in depth: job-level cannot bypass worker-level restrictions

Tests added:
- test_additive_approval_semantics_both_levels_must_approve: verifies job-level
  blocks take effect even when worker-level allows
- test_additive_approval_worker_block_overrides_job_allow: verifies worker-level
  blocks take effect even when job-level allows
- test_additive_approval_both_levels_allow: verifies tool is allowed only when
  both levels approve

Security review reference:
- Issue #3 from @G7CNF: "document or enforce additive semantics for job + worker
  approval checks"
- Issue #2 from @zmanian: "Job-level context bypasses worker-level entirely"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address PR #1125 review feedback

- Restore requirement-aware is_blocked() semantics: Never and
  UnlessAutoApproved tools pass in autonomous context, Only Always
  tools require explicit allowlist entry
- Use AutonomousUnavailable error (with descriptive reason) instead
  of generic AuthRequired for approval blocking in worker
- Deduplicate approval_context propagation in scheduler dispatch
  (single update_context_and_get call instead of duplicated blocks)
- Remove http from builder tool allowlist (shell handles network)
- Add TODO comments for serde(skip) losing approval_context on DB
  restore in both libsql and postgres backends
- Add tests: Never tools in additive model, builder unlisted tool
  blocking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(worker): remove duplicate approval check and use normalized params

- Remove pre-existing worker-level-only approval check (lines 561-567)
  that duplicated the new additive check, using a different error type
  and missing job-level context
- Use normalized_params (not raw params) for requires_approval() so
  parameter-dependent approval (e.g. shell destructive detection) works
  correctly with coerced values
- Remove unused autonomous_unavailable_error import
- Add comment documenting unreachable else branch in scheduler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ilblackdragon@gmail.com <ilblackdragon@gmail.com>
serrrfirat pushed a commit that referenced this pull request Apr 5, 2026
…B-backed pairing, and OwnershipCache (#1898)

* feat(ownership): add OwnerId, Identity, UserRole, can_act_on types

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): private OwnerId field, ResourceScope serde derives, fix doc comment

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* refactor(tenant): replace SystemScope::db() escape hatch with typed workspace_for_user(), fix stale variable names

- Add SystemScope::workspace_for_user() that wraps Workspace::new_with_db
- Remove SystemScope::db() which exposed the raw Arc<dyn Database>
- Update 3 callers (routine_engine.rs x2, heartbeat.rs x1) to use the new method
- Fix stale comment: "admin context" -> "system context" in SystemScope
- Rename `admin` bindings to `system` in agent_loop.rs for clarity

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(tenant): rename stale admin binding to system_store in heartbeat.rs

* refactor(tenant): TenantScope/TenantCtx carry Identity, add with_identity() constructor and bridge new()

- TenantScope: replace `user_id: String` field with `identity: Identity`; add `with_identity()` preferred constructor; keep `new(user_id, db)` as Member-role bridge; add `identity()` accessor; all internal method bodies use `identity.owner_id.as_str()` in place of `&self.user_id`
- TenantCtx: replace `user_id: String` field with `identity: Identity`; update constructor signature; add `identity()` accessor; `user_id()` delegates to `identity.owner_id.as_str()`; cost/rate methods updated accordingly
- agent_loop: split `tenant_ctx(&str)` into bridge + new `tenant_ctx_with_identity(Identity)` which holds the full body; bridge delegates to avoid duplication

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(db): add V16 tool scope, V17 channel_identities, V18 pairing_requests migrations

- PostgreSQL: V16__tool_scope.sql adds scope column to wasm_tools/dynamic_tools
- PostgreSQL: V17__channel_identities.sql creates channel identity resolution table
- PostgreSQL: V18__pairing_requests.sql creates pairing request table replacing file-based store
- libSQL SCHEMA: adds scope column to wasm_tools/dynamic_tools, channel_identities, pairing_requests tables
- libSQL INCREMENTAL_MIGRATIONS: versions 17-19 for existing databases
- IDEMPOTENT_ADD_COLUMN_MIGRATIONS: handles fresh-install/upgrade dual path for scope columns
- Runner updated to check ALL idempotent columns per version before skipping SQL
- Test: test_ownership_model_tables_created verifies all new tables/columns exist after migrations

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(db): use correct RFC3339 timestamp default in libSQL, document version sequence offset

Replace datetime('now') with strftime('%Y-%m-%dT%H:%M:%fZ', 'now') in the
channel_identities and pairing_requests table definitions (both in SCHEMA and
INCREMENTAL_MIGRATIONS) to match the project-standard RFC 3339 timestamp format
with millisecond precision. Also add a comment clarifying that libSQL incremental
migration version numbers are independent from PostgreSQL VN migration numbers.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(ownership): bootstrap_ownership(), migrate_default_owner, V19 FK migration, replace hardcoded 'default' user IDs

- Add V19__ownership_fk.sql (programmatic-only, not in auto-migration sweep)
- Add `migrate_default_owner` to Database trait + both PgBackend and LibSqlBackend
- Add `get_or_create_user` default method to UserStore trait
- Add `bootstrap_ownership()` to app.rs, called in init_database() after connect_with_handles
- Replace hardcoded "default" owner_id in cli/config.rs, cli/mcp.rs, cli/mod.rs, orchestrator/mod.rs
- Add TODO(ownership) comments in llm/session.rs and tools/mcp/client.rs for deferred constructors

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): atomic get_or_create_user, transactional migrate_default_owner, V19 FK inline constant, fix remaining 'default' user IDs

- Delete migrations/V19__ownership_fk.sql so refinery no longer auto-applies FK constraints before bootstrap_ownership runs; add OWNERSHIP_FK_SQL constant with TODO for future programmatic application
- Remove racy SELECT+INSERT default in UserStore::get_or_create_user; both PostgreSQL (ON CONFLICT DO NOTHING) and libSQL (INSERT OR IGNORE) now use atomic upserts
- Wrap migrate_default_owner in explicit transactions on both backends for atomicity
- Make bootstrap_ownership failure fatal (propagate error instead of warn-and-continue)
- Fix mcp auth/test --user: change from default_value="default" to Option<String> resolved from configured owner_id
- Replace hardcoded "default" user IDs in channels/wasm/setup.rs with config.owner_id
- Replace "default" sentinel in OrchestratorState test helper with "<unset>" to make the test-only nature explicit

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): remove default user_id from create_job(), change sentinel strings to <unset>

- Gate ContextManager::create_job() behind #[cfg(test)]; production code must
  use create_job_for_user() with an explicit user_id to prevent DB rows with
  user_id = 'default' being silently created on the production write path.
- Change the placeholder user_id in McpClient::new(), new_with_name(), and
  new_with_config() from "default" to "<unset>" so accidental secrets/settings
  lookups surface immediately rather than silently touching the wrong DB partition.
- Same sentinel change for SessionManager::new() and new_async() in session.rs;
  these are overwritten by attach_store() at startup with the real owner_id.
- Update tests that asserted the old "default" sentinel to expect "<unset>", and
  switch test_list_jobs_tool / test_job_status_tool to create_job_for_user("default")
  to keep ownership alignment with JobContext::default().

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(db): add ChannelPairingStore sub-trait with resolve_channel_identity, upsert/approve pairing, PostgreSQL + libSQL implementations

Adds PairingRequestRecord, ChannelPairingStore trait (5 methods), and
generate_pairing_code() to src/db/mod.rs; implements for PgBackend in
postgres.rs and LibSqlBackend in libsql/pairing.rs; wires ChannelPairingStore
into the Database supertrait bound; all 6 libSQL unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(db): atomic libSQL approve_pairing with BEGIN IMMEDIATE, add case-insensitive/expired/double-approve tests

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(ownership): add OwnershipCache for zero-DB-read identity resolution on warm path

Converts src/ownership.rs to src/ownership/ module directory and adds
src/ownership/cache.rs with a write-through in-process cache mapping
(channel, external_id) -> Identity. Wired as Arc<OwnershipCache> on
AppComponents for Task 8 pairing integration. All 7 cache unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(e2e): add ownership model E2E tests and extend pairing tests for DB-backed store

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): remove unused asyncio import, add fallback assertion in test_pairing_response_structure

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(tenant): unit tests for TenantScope::with_identity and AdminScope construction

Adds 5 focused unit tests verifying TenantScope::with_identity stores the
full Identity (owner_id + role), TenantScope::new creates a Member-role
identity, and AdminScope::new returns Some for Admin and None for Member.
Uses LibSqlBackend::new_memory() as the test DB stub.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): recover from RwLock poison instead of expect() in OwnershipCache

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(ownership): integration tests for bootstrap, tenant isolation, and ChannelPairingStore

Adds tests/ownership_integration.rs covering migrate_default_owner idempotency,
TenantScope per-user setting isolation (including Admin role bypass check),
and the full ChannelPairingStore lifecycle (upsert, approve, remove, multi-channel isolation).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate pairing tests and flaky random-code assertion from integration suite

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(pairing): rewrite PairingStore to DB-backed async with OwnershipCache

Replaces the file-based pairing store (~/.ironclaw/*-pairing.json,
*-allowFrom.json) with a DB-backed async implementation that delegates
to ChannelPairingStore and writes through to OwnershipCache on reads.

- PairingStore::new(db, cache) uses the DB; new_noop() for test/no-DB
- resolve_identity() cache-first lookup via OwnershipCache
- approve(code, owner_id) removes channel arg (DB looks up by code)
- All WASM host functions updated: pairing_upsert_request uses block_in_place,
  pairing-is-allowed renamed to pairing-resolve-identity returning Option<String>,
  pairing-read-allow-from deprecated (returns empty list)
- Signal channel receives PairingStore via new(config, db) constructor
- Web gateway pairing handlers read from state.store (DB) directly
- extensions.rs derive_activation_status drops PairingStore dependency;
  derives status from extension.active and owner_binding flag instead
- All test call sites updated to use new_noop()

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pairing): add missing pairing_store field to all GatewayState initializers, fix disk-full post-edit compile

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(channels): remove owner_id from IncomingMessage, user_id is the canonical resolved OwnerId

`owner_id` on `IncomingMessage` was always a duplicate of `user_id` —
both fields held the same value at every call site. Remove the field and
`with_owner_id()` builder, update the four WASM-wrapper and HTTP test
assertions to use `user_id`, and drop the redundant struct literal field
in the routine_engine test helper.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(channels): remove stale owner_id param from make_message test helper

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(e2e): add browser/Playwright tests for ownership model — auth screen, chat UI, owner login

Adds five Playwright-based browser tests to the ownership model E2E suite
verifying the web UI experience: authenticated owner sees chat input, unauthenticated
browser sees auth screen, owner can send a message and receive a response, settings
tab renders without errors, and basic page structure is correct after login.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(settings): migrate channel credentials from plaintext settings to encrypted secrets store

Moves nearai.session_token from the plaintext DB settings table to the
AES-256-GCM encrypted secrets store (key: nearai_session_token).

- SessionManager gains an `attach_secrets()` method that wires in the
  secrets store; `save_session` writes to it when available and
  `load_session_from_secrets` is called preferentially over settings
- `migrate_session_credential()` runs idempotently on each startup in
  `init_secrets()`, reading the JSON session from settings, writing it
  to secrets, then deleting the plaintext copy
- Wizard's `persist_session_to_db` now writes to secrets first, falling
  back to plaintext settings only when secrets store is unavailable
- Plaintext settings path is preserved as fallback for installs without
  a secrets store (no master key configured)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(settings): settings fallback only when no secrets store, verify decryption before deleting plaintext

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): ROLLBACK in libSQL migrate_default_owner, shared OwnershipCache across channels, add dynamic_tools to migration, fix doc comment

- libSQL migrate_default_owner: wrap UPDATE loop in async closure + match to emit ROLLBACK on any mid-transaction failure (mirroring approve_pairing pattern)
- Both backends: add dynamic_tools to the migrate_default_owner table list so agent-built tools are migrated on first pairing
- setup_wasm_channels: accept Arc<OwnershipCache> parameter instead of allocating a fresh cache, share the AppComponents cache
- SignalChannel::new: accept Arc<OwnershipCache> parameter and pass it to PairingStore instead of allocating a new cache
- PairingStore: fix module-level and struct-level doc comments to accurately describe lazy cache population after approve()

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(web): use can_act_on for authorization in job/routine handlers instead of raw string comparisons

Replace 12 raw `user_id != user.user_id` / `user_id == user.user_id` string comparisons
in jobs.rs and 4 in routines.rs with calls through the canonical `can_act_on` function
from `crate::ownership`, which is the spec-mandated authorization mechanism.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* chore: include remaining modified files in ownership model branch

* fix: add pairing_store field to test GatewayState initializers, update PairingStore API calls in integration tests

Add missing `pairing_store: None` to all GatewayState struct initializers
in test files. Migrate old file-based PairingStore API calls
(PairingStore::new(), PairingStore::with_base_dir()) to the new DB-backed
API (PairingStore::new_noop()). Rewrite pairing_integration.rs to use
LibSqlBackend with the new async DB-backed PairingStore API.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* chore: cargo fmt

* fix(pairing): truly no-op PairingStore noop mode, ensure owner user in CLI, fix signal safety comments

- PairingStore::upsert_request now returns a dummy record in noop mode instead of
  erroring, and approve silently succeeds (matching the doc promise of "writes
  are silently discarded").
- PairingStore::approve now accepts a channel parameter, matching the updated
  DB trait signature and propagated to all call sites (CLI, web server, tests).
- CLI run_pairing_command ensures the owner user row exists before approval to
  satisfy the FK constraint on channel_identities.owner_id.
- Signal channel block_in_place safety comments corrected from "WASM channel
  callbacks" to "Signal channel message processing".

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pairing): thread channel through approve_pairing, add created flag, retry on code collision, remove redundant indexes

Addresses PR review comments:
- approve_pairing validates code belongs to the given channel
- PairingRequestRecord.created replaces timing heuristic
- upsert retries on UNIQUE violation (up to 3 attempts)
- redundant indexes removed (UNIQUE creates implicit index)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): migrate api_tokens, serialize PG approvals, propagate resolved owner_id

Addresses PR review P1/P2 regressions:

- api_tokens included in migrate_default_owner (both backends)
- PostgreSQL approve_pairing uses FOR UPDATE to prevent concurrent approvals
- Signal resolve_sender_identity returns owner_id, set as IncomingMessage.user_id
  with raw phone number preserved as sender_id for reply routing
- Feishu uses resolved owner_id from pairing_resolve_identity in emitted message
- PairingStore noop mode logs warning when pairing admission is impossible

[skip-regression-check]

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-review): sanitize DB errors in pairing handlers, fix doc comments, add TODO for derive_activation_status

- Pairing list/approve handlers no longer leak DB error details to clients
- NotFound errors return user-friendly 'Invalid or expired pairing code' message
- Module doc in pairing/store.rs corrected (remove -> evict, no insert method)
- wit_compat.rs stub comment corrected to match actual Val shape
- TODO added for derive_activation_status has_paired approximation

* fix(pr-review): propagate libSQL query errors in approve_pairing, round-trip validate session credential migration, fix test doc comment

- libSQL approve_pairing: .ok().flatten() replaced with .map_err() to propagate DB errors
- migrate_session_credential: round-trip compares decrypted secret against plaintext before deleting
- ownership_integration.rs: doc comment corrected to match actual test coverage

* fix(pairing): store meta, wrap upserts in transactions, case-insensitive role/channel, log Signal DB errors, use auth role in handlers

- Store meta JSONB/TEXT column in pairing_requests (PG migration V18, libSQL schema + incremental migration 19)
- Wrap upsert_pairing_request in transactions (PG: client.transaction(), libSQL: BEGIN IMMEDIATE/COMMIT/ROLLBACK)
- Case-insensitive role parsing: eq_ignore_ascii_case("admin") in both backends
- Case-insensitive channel matching in approve_pairing: LOWER(channel) = LOWER($2)
- Log DB errors in Signal resolve_sender_identity instead of silently discarding
- Use auth role from UserIdentity in web handlers (jobs.rs, routines.rs) via identity_from_auth helper
- Fix variable shadowing: rename `let channel` to `let req_channel` in libsql approve_pairing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): add auth to pairing list, cache eviction on deactivate, runtime assert in Signal, remove default fallback, warn on noop pairing codes

Addresses zmanian's review:
- #1: pairing_list_handler requires AuthenticatedUser
- #2: OwnershipCache.evict_user() evicts all entries for a user on suspension
- #3: debug_assert! for multi-thread runtime in Signal block_in_place
- #9: Noop PairingStore warns when generating unredeemable codes
- #10: cli/mcp.rs default fallback replaced with <unset>

* fix(pairing): consistent LOWER() channel matching in resolve_channel_identity, fix wizard doc comment, fix E2E test assertion for ActionResponse convention

* fix(pairing): apply LOWER() consistently across all ChannelPairingStore queries (upsert, list_pending, remove)

All channel matching now uses LOWER() in both PostgreSQL and libSQL backends:
- upsert_pairing_request: WHERE LOWER(channel) = LOWER($1)
- list_pending_pairings: WHERE LOWER(channel) = LOWER($1)
- remove_channel_identity: WHERE LOWER(channel) = LOWER($1)

Previously only resolve_channel_identity and approve_pairing used LOWER(),
causing inconsistent matching when channel names differed by case.

* fix(pairing): unify code challenge flow and harden web pairing

* test: harden pairing review follow-ups

* fix: guard wasm pairing callbacks by runtime flavor

* fix(pairing): normalize channel keys and serialize pg upserts

* chore(web): clean up ownership review follow-ups

* Preserve WASM pairing allowlist compatibility

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
serrrfirat added a commit that referenced this pull request Apr 6, 2026
- Replace fragile time.time()-1 fallback with explicit SmokeError in
  run_smoke.py attachment case (reviewer finding #1)
- Add OnceLock<Mutex> guard around env var mutation in wrapper.rs unit
  test to prevent parallel test races (reviewer finding #2)
- Extract duplicated git-worktree discovery into find_project_file()
  helper in slack_auth_integration.rs (reviewer finding #3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants