arch(reborn): consolidate composition root, narrow public surface, ship live binary#3695
Conversation
The Reborn composition root needs to build an LlmProvider from a RegistryProviderConfig without dragging in v1 SessionManager / OAuth plumbing (which only the NEAR AI provider chain needs). create_llm_provider() routes via SessionManager; for registry-based providers (OpenAI, Anthropic, Ollama, DeepSeek, Gemini, OpenRouter, GitHub Copilot) it ultimately calls create_registry_provider() with no SessionManager use. Exposing that internal entry point lets ironclaw_reborn_composition::build_reborn_runtime construct an LlmProvider directly from a composition-owned RebornLlmConfig. No behavior change for existing callers.
The composition crate's existing docstring promised "Product/AppBuilder
integration belongs in later slices." This is that slice.
Adds:
- RebornRuntimeInput { services, llm, runner }
- RebornRuntime task-level handle (new_conversation,
send_user_message, shutdown) -- callers never see TurnCoordinator,
SessionThreadService, HostManagedModelGateway, or any other
substrate trait
- build_reborn_runtime(input) -- substrate + driver registry +
LLM model gateway bridge (behind root-llm-provider feature) +
text-only loop driver + ThreadCheckpointLoopExitEvidencePort +
TurnRunnerWorker, all wired and spawned as one unit
- RebornLlmConfig composition-owned LLM config DTO with
openai_compat() convenience constructor
Feature flag:
ironclaw_reborn_composition gains a root-llm-provider feature that
pulls ironclaw_llm (optional, default-features = false) and turns on
ironclaw_reborn/root-llm-provider. Mirrors the discipline already
pinned by reborn_loop_support_llm_wiring_stays_out_of_root_src for
ironclaw_reborn.
Profile coverage:
Only RebornCompositionProfile::LocalDev is wired end-to-end here.
Production and migration-dry-run still return their substrate-only
RebornServices; calling build_reborn_runtime with them yields a
clear "not yet wired" error rather than partially starting an agent.
Without an LLM provided, a stub gateway is wired so the worker boots
cleanly and the failure surfaces at first send_user_message
(status=RecoveryRequired) rather than at runtime construction. That
keeps `ironclaw-reborn run --dry-run` and substrate-only smoke paths
green without API credentials.
… modules
ironclaw_reborn previously exposed ~25 types as a wall of `pub use`
re-exports at the crate root: capability resolvers and surface profile
filters from ironclaw_loop_support, milestone scope/sink, model route
policies, planned-driver factory helpers, the loop-driver-host factory,
text-only driver, model gateway adapters, and more.
Trace audit: zero external workspace crates ever named any of those
items. The wall was pure speculative public API (Illia's concern on
PR review: "this leaves opportunity to use internals"). Composition
imports through submodule paths (ironclaw_reborn::driver_registry::*,
::loop_driver_host::*, ::model_gateway::*) and doesn't need the
flattened surface at all.
This commit:
- lib.rs becomes a directory of `pub mod` declarations and a
docstring. No more `pub use` flattening.
- app_loop_family, milestone_events, model_routes, planned_driver,
planned_driver_factory promoted to `pub mod` so existing internal
integration tests in crates/ironclaw_reborn/tests/ can still reach
them via paths instead of flattened re-exports. The submodules
were previously private and lived only behind the noisy re-export
wall.
- Internal tests updated to use module-path imports.
- One `use crate::build_loop_family_registry` (in src/) updated to
use the new module path.
- Drive-by fix: removed a pre-existing duplicate
`impl LoopCancellationPort for ResumePayloadHost` in
planned_driver.rs tests that broke `cargo test --no-run -p
ironclaw_reborn` on baseline.
Behavior change for downstream consumers: zero (the boundary tests
already forbid any non-composition crate from importing
ironclaw_reborn, so nothing outside the crate could have observed
the old surface).
The next commit (CLI rewire) drops the CLI's direct
`ironclaw_reborn` dep entirely; the commit after that pins the
narrowed surface in boundary tests.
The `ironclaw-reborn run` command previously initialized a side-effect-
free "runtime shell" snapshot and exited. With the composition root
now exposing a full RebornRuntime, `run` becomes a real agent entry
point:
- tokio multi-thread runtime spun up in fn run()
- reads LLM provider config from env (OPENAI_API_KEY,
ANTHROPIC_API_KEY, OLLAMA_BASE_URL) into RebornLlmConfig
- calls build_reborn_runtime() to spawn the worker
- `run -m "hello"` -- single-shot mode: submit, wait, print
assistant reply, shutdown
- `run` without -m -- stdin REPL until EOF or Ctrl-C
- `run --dry-run` preserves the legacy diagnostic-only banner so
existing smoke tests (and any tooling that scrapes the snapshot)
keep working without modification beyond appending the flag
CLI dependency graph -- only Reborn-prefixed crates now allowed:
- ironclaw_reborn_composition (new; replaces ironclaw_reborn)
- ironclaw_reborn_config (unchanged)
- secrecy, tokio, tracing-subscriber (added)
Dropped: direct `ironclaw_reborn` import. The two CLI files that
reached `ironclaw_reborn::driver_registry::DriverRegistry::new()` and
`ironclaw_reborn::ModelSlot` were rewritten: doctor.rs no longer
constructs a DriverRegistry (it was a no-op diagnostic anyway), and
models.rs ships a CLI-local mirror of the ModelSlot label taxonomy so
the diagnostic `models list` / `models status` commands keep their
existing output without crossing the narrowed boundary.
Behavior:
IRONCLAW_REBORN_HOME=\ ironclaw-reborn run -m hello # ~2s round
trip without LLM env vars surfaces:
"(no assistant text; status=RecoveryRequired, run_id=...)"
which is the documented stub-gateway path. With a real
OPENAI_API_KEY exported, the same command would print the assistant
reply.
Smoke tests:
Two existing diagnostic-shape tests now invoke `run --dry-run` to
exercise the same snapshot they previously asserted on. All 35 CLI
smoke tests pass.
Locks the new dependency shape in place so a future contributor can't
accidentally re-open the speculative public API surface or shortcut
through ironclaw_reborn directly from the CLI.
Changes:
reborn_loop_support_llm_wiring_stays_out_of_root_src:
Extended to also assert that ironclaw_reborn_composition gates
ironclaw_llm behind the same root-llm-provider feature with
`optional = true, default-features = false`. The composition root
is the new sanctioned LLM-gateway construction site, so it
inherits the same discipline ironclaw_reborn already had to
follow.
reborn_internal_crate_keeps_directory_of_modules_lib_rs (new):
Forbids re-introducing the wall of `pub use` re-exports in
crates/ironclaw_reborn/src/lib.rs (covers all 9 internal modules
whose items previously leaked: loop_support, loop_driver_host,
milestone_events, model_gateway, model_routes, planned_driver,
planned_driver_factory, text_loop_driver, app_loop_family).
Also asserts that the runtime assembly (build_reborn_runtime,
RebornRuntime) actually lives in ironclaw_reborn_composition
and reaches the underlying types via module paths, so the assembly
slice cannot drift to another crate.
BoundaryRule for ironclaw_reborn_cli:
Now forbids ironclaw_reborn, ironclaw_llm, ironclaw_loop_support,
ironclaw_threads, ironclaw_turns (on top of the existing
ironclaw / ironclaw_engine / ironclaw_gateway / ironclaw_skills /
ironclaw_tui forbidden set). The CLI must enter Reborn only
through ironclaw_reborn_composition.
reborn_cli_binary_crate_stays_separate_from_v1_root:
Updated the exact-deps assertion to expect
{ironclaw_reborn_composition, ironclaw_reborn_config} instead of
the old {ironclaw_reborn, ironclaw_reborn_config}.
There was a problem hiding this comment.
Code Review
This pull request refactors the Reborn architecture by narrowing the public API of ironclaw_reborn and introducing ironclaw_reborn_composition as the central assembly point for the runtime. The ironclaw_reborn_cli is updated to leverage this new composition layer, enabling a live REPL and single-shot message execution, supported by expanded architectural tests to enforce dependency boundaries. Review feedback highlights critical improvements for the runtime assembly, including handling JoinError during worker shutdown to prevent swallowing panics, ensuring the wake channel remains open to avoid CPU spinning, and utilizing exhaustive matching for protocol mapping to improve maintainability.
| /// returning. | ||
| pub async fn shutdown(self) -> Result<(), RebornRuntimeError> { | ||
| self.worker_cancel.cancel(); | ||
| let _ = self.worker_handle.await; |
There was a problem hiding this comment.
The result of self.worker_handle.await is ignored, which swallows potential panics or join errors from the background worker task during shutdown. Per repository rules, when handling JoinError from tokio tasks, we should log the error and distinguish between panics and cancellations to aid debugging.
match self.worker_handle.await {
Ok(_) => {}
Err(e) => {
if e.is_panic() {
tracing::error!("reborn worker task panicked: {e}");
} else {
tracing::warn!("reborn worker task cancelled: {e}");
}
}
}References
- When handling errors from tokio tasks, log the JoinError and distinguish between panics and cancellations in the error message to capture debugging information.
| evidence_port, | ||
| )); | ||
|
|
||
| let (_wake_sender, wake_receiver) = TurnRunnerWakeReceiver::new(); |
There was a problem hiding this comment.
Dropping the _wake_sender immediately closes the wake channel. If the TurnRunnerWorker implementation uses a select! loop that doesn't explicitly handle a closed channel (e.g., by checking for None from recv()), this can cause a CPU spin as the receiver branch will continuously trigger. Additionally, this disables the event-driven wake mechanism, forcing the worker to rely solely on the poll_interval for processing runs.
| let protocol = match cfg.protocol.as_str() { | ||
| "openai_completions" | "openai" => ProviderProtocol::OpenAiCompletions, | ||
| "anthropic" => ProviderProtocol::Anthropic, | ||
| "ollama" => ProviderProtocol::Ollama, | ||
| "deepseek" => ProviderProtocol::DeepSeek, | ||
| "gemini" => ProviderProtocol::Gemini, | ||
| "openrouter" => ProviderProtocol::OpenRouter, | ||
| "github_copilot" => ProviderProtocol::GithubCopilot, | ||
| other => { | ||
| return Err(RebornRuntimeError::LlmProvider(format!( | ||
| "unsupported llm protocol: {other}" | ||
| ))); | ||
| } | ||
| }; |
There was a problem hiding this comment.
This match statement maps strings to ProviderProtocol variants using a wildcard other arm. Per the general rules, mapping between types should ideally be exhaustive to ensure that new variants added to the target enum are explicitly handled. Using a string-based match here bypasses compile-time checks for new protocols added to ironclaw_llm::ProviderProtocol.
References
- Mapping between enums or state-representing types should use exhaustive match statements instead of wildcard arms to ensure all cases are explicitly handled and to catch new variants at compile time.
zmanian
left a comment
There was a problem hiding this comment.
Code Review
Overview
Three coordinated changes land together:
- Composition root absorbs the runtime assembly slice.
ironclaw_reborn_compositiongainsbuild_reborn_runtime+RebornRuntime— substrate facades plus driver registry, host factory, model gateway adapter, and a spawnedTurnRunnerWorker. Newruntime.rs(~630 lines) andruntime_input.rs(~120 lines). ironclaw_reborn's public surface is narrowed from a ~25-itempub usewall to a directory ofpub moddeclarations. Five internal tests + two internal call sites updated to module-path imports. One drive-by fix removes a duplicateimpl LoopCancellationPortthat brokecargo test --no-runon baseline.- CLI boots a live agent.
ironclaw-reborn run -m "..."andironclaw-reborn run(REPL) compose through the new entry point. The existing diagnostic-only behavior is preserved behind a new--dry-runflag so smoke tests keep working.
Boundary tests (ironclaw_architecture) pin all of the above: new forbidden deps on ironclaw_reborn_cli, a root-llm-provider-gating assertion against ironclaw_reborn_composition/Cargo.toml, and a new reborn_internal_crate_keeps_directory_of_modules_lib_rs test that bans the old pub use wall in crates/ironclaw_reborn/src/lib.rs.
What's strong
- Boundaries-as-tests: the cleanup is locked in by
ironclaw_architectureassertions, not just convention. A future contributor re-adding the noisypub usewall fails CI. - Docstrings carry the architectural intent.
crates/ironclaw_reborn/src/lib.rsand theruntime.rsheader both clearly state who the sanctioned consumer is and why. --dry-runpreservation keeps the existing smoke-test signal alive without forking the binary's behavior.RebornRuntimeexposes only task-level methods (new_conversation/send_user_message/shutdown). Internal types (TurnCoordinator,SessionThreadService,LoopExitApplier, etc.) are not leaked through the runtime handle — only the diagnosticRebornServicessnapshot is exposed and labeled as such.- Stub gateway pattern is right (
build_stub_gateway). Returning aHostManagedModelError::safe(Unavailable, ...)instead of failing at startup means an operator can verify the binary boots before wiring credentials — and the PR description's manual run output confirms this. - The duplicate
impl LoopCancellationPortfix is correctly called out separately in the commit message and reduced to a clarifying comment.
Concerns
1. Wake channel is dropped on construction (medium — agree with the bot review)
In runtime.rs:
let (_wake_sender, wake_receiver) = TurnRunnerWakeReceiver::new();The leading underscore drops the sender immediately. This:
- Disables the event-driven wake mechanism — the worker only ever fires on
poll_interval. - If
TurnRunnerWorker's receive branch in aselect!doesn't guardNone, this could spin.
Either store the sender on RebornRuntime (and expose wake() for send_user_message to call after submit_turn returns Accepted), or add a comment explicitly stating "intentionally drop — worker tolerates closed channel and we rely on poll-only for the CLI slice." The former is the correct fix; the latter at minimum needs to be verified against the worker's actual select! behavior. As-is this is silently degrading event-driven throughput.
2. JoinError is swallowed on shutdown (medium — agree with the bot review)
pub async fn shutdown(self) -> Result<(), RebornRuntimeError> {
self.worker_cancel.cancel();
let _ = self.worker_handle.await;
Ok(())
}If the worker panics, the panic is silently dropped. Project rules emphasize root-cause diagnosis. At minimum log the JoinError with panic-vs-cancellation discrimination:
if let Err(e) = self.worker_handle.await {
if e.is_panic() {
tracing::error!("reborn worker task panicked: {e}");
} else {
tracing::warn!("reborn worker task cancelled: {e}");
}
}3. CLI-specific identifiers baked into composition root (medium — coupling)
runtime.rs hardcodes "reborn-cli" as the tenant id, agent id suffix, and binding ids. The PR's own framing says the composition crate is the assembly point for "the CLI and any future Reborn ingress," but a web channel or e2e harness would inherit tenant_id = "reborn-cli" from the substrate, which is wrong.
Better: take a RuntimeIdentity { tenant, agent, source_binding } (or just three strings) on RebornRuntimeInput with a "reborn-cli" default that the CLI sets explicitly. This keeps the composition crate channel-agnostic, which the docstring already promises.
4. ModelSlot duplicated in the CLI without enforcement (low)
commands/models.rs mirrors ironclaw_reborn::ModelSlot as a CLI-local enum to satisfy the boundary rule, with a comment explaining why. The boundary tests forbid the import but don't catch drift. If a third slot lands in ironclaw_reborn::ModelSlot, the CLI silently keeps reporting two.
Two reasonable options:
- Re-export
ModelSlot(or just its&'static [&'static str]slot-name list) fromironclaw_reborn_composition, which is on the CLI's import-allowed list. Composition can keep the enum private and surface only the names. - Add an architecture test that asserts the slot-name list in
commands/models.rsmatches the one inironclaw_reborn::model_routes::ModelSlot.
5. Dead local in build_runtime_input (nit)
commands/run.rs:
let composition_profile = match config.profile() {
RebornProfile::LocalDev => RebornCompositionProfile::LocalDev,
other => anyhow::bail!(...),
};
let _ = composition_profile; // currently always LocalDev when reached.The CLI then calls RebornBuildInput::local_dev(...) and never passes composition_profile through. The same gate is also enforced inside build_reborn_runtime, so the CLI check is duplicative defense-in-depth. Either:
- Drop the CLI-side check entirely and rely on the inner one (
anyhow::bail!becomes a propagatedRebornRuntimeError::InvalidArgument), or - Pass the variant through if the substrate input ever takes one (it doesn't today —
local_devis the only profile-specific factory exposed).
The let _ = composition_profile; is a strong code smell pointing at this.
6. protocol: String in RebornLlmConfig (low — disagree with the bot review)
The bot flagged the wildcard arm in the string→ProviderProtocol match. I'd push back: keeping protocol as String in RebornLlmConfig is the right call here — it prevents ironclaw_llm::ProviderProtocol from leaking into the composition crate's public surface, which is the whole point of this PR.
However, the validation could be more rigorous: today an unknown protocol fails at runtime on first send_user_message. Consider either:
- Validating the string in
RebornLlmConfig::newso misconfiguration fails at startup, or - Documenting on the field that valid values are enumerated and adding a const slice the docstring can reference.
7. Minor cosmetic / style
tests/loop_driver_host.rshas two non-contiguoususe ironclaw_loop_supportblocks. Merge them.init_tracing()incommands/run.rsreturnsResult<(), ()>and the caller drops it withlet _ = init_tracing();. Idiomatically this should befn init_tracing()returning()(since the error is already discarded), or the caller should at least log "tracing already initialized" to stderr.runtime.rsexposesservices()for "diagnostic readiness snapshot only" — consider renaming toreadiness_snapshot()to discourage callers from reading it as a typed handle to substrate facades.external_event_id: Some(format!("reborn-cli:{}", Uuid::new_v4()))— fresh UUID per send defeats the idempotency-key purpose ofexternal_event_id. Fine for the CLI slice but worth a comment noting the trade-off.
Tests / coverage
- 5/5 in
ironclaw_reborn_composition, 225/225 inironclaw_reborn, 35/35 in CLI, 17/17 inironclaw_architecture(per the PR body). - Gap: no test actually drives
build_reborn_runtimeend-to-end with the stub gateway and asserts the resultingAssistantReply { status: RecoveryRequired, text: None }. The manual run output in the PR description is the only evidence the assembly works. A#[tokio::test]inironclaw_reborn_composition/tests/that asserts the stub gateway path producesRecoveryRequiredwould lock in the contractRebornRuntime::send_user_messageadvertises in its docstring. - The boundary tests are excellent and pin exactly the right invariants.
mergeable_state: dirtyon the PR will need rebasing againstreborn-integrationbefore merge.
Security / safety
- Nothing here introduces a new I/O surface beyond stdin/stderr/stdout and the existing LLM provider HTTP flow.
- API keys are wrapped in
secrecy::SecretStringend-to-end, including throughRebornLlmConfig. Good. #[forbid(unsafe_code)]is preserved onironclaw_reborn_composition.
Performance
RebornRuntime::wait_for_terminalpolls at 100ms with a 180s ceiling (PollSettings::default). Reasonable for CLI; not exposed to callers. If the wake-channel fix in concern 1 lands, this poll becomes a fallback rather than the primary signal.read_latest_assistant_textre-reads full thread history per send; fine for an interactive CLI, would not scale to long conversations or a server channel.
Bottom line
Solid architectural cleanup. The boundary-tests-as-enforcement story is exemplary. Three concerns are worth resolving before merge:
- Wake channel (concern 1) — verify whether dropping the sender silently degrades throughput or risks a spin; either keep the sender or document the deliberate choice.
JoinErrorhandling on shutdown (concern 2) — log panic vs. cancellation rather than swallow.- CLI identifiers in composition root (concern 3) — parameterize
tenant_id/agent_id/ source binding so the composition crate stays channel-agnostic per its own docstring.
Concerns 4–7 are quality-of-life and can be follow-ups. The duplicate impl LoopCancellationPort removal is a worthwhile drive-by and correctly disclosed.
Generated by Claude Code
|
Addressed the review feedback in
Validation run locally:
|
|
Addressed the follow-up review findings from the codex-review pass:
Verified with:
|
Code ReviewOverviewThree coupled changes that ship together:
+1683 / −205, 25 files. Locked in with new boundary tests. Strengths
Issues / Suggestions1.
|
…rness test(reborn): add product-live planned AgentLoop harness
|
Addressed the review feedback from #3695 (comment) in
Validation:
Note: |
zmanian
left a comment
There was a problem hiding this comment.
Overall direction is right — collapsing the speculative pub use wall, pinning the new shape with boundary tests, and shipping a binary that actually exercises the assembled runtime are all things I have been asking for. A few things I want addressed or at least acknowledged before this lands.
1. Substrate/binary boundary — composition crate has grown a personality
The architectural premise I care about is: Reborn substrate (ironclaw_agent_loop, ironclaw_reborn, ironclaw_loop_support, ironclaw_threads, ironclaw_turns) must not depend on the current binary, and the substrate must remain reusable independent of any specific app.
This PR is fine on the binary side — ironclaw_reborn_cli's deps are pinned to {ironclaw_reborn_composition, ironclaw_reborn_config} and the boundary test forbids the substrate names. Good.
What I want flagged is what happened to ironclaw_reborn_composition. Before this PR, its docstring promised "substrate services only — Product/AppBuilder integration belongs in later slices." After this PR, the same crate hard-deps on ironclaw_reborn, ironclaw_threads, ironclaw_loop_support, and (gated) ironclaw_llm, plus tokio/tokio-util/uuid, and ships an InMemorySessionThreadService-bound RebornRuntime with a turn-runner worker spawned inside it. That is not a composition root, that is the live agent runtime. The crate is doing two jobs now:
- substrate-only facades for callers that wire their own loop (
build_reborn_services) - assembled-runtime ingress with in-memory thread store + worker (
build_reborn_runtime/RebornRuntime)
These should be two crates, or at minimum the assembled-runtime piece should be feature-gated so a downstream consumer can take ironclaw_reborn_composition for facades without pulling ironclaw_reborn and the worker machinery. Otherwise "composition root" becomes the new wall of speculative API. I would accept the merge with a tracking issue, but please open it before this lands.
2. Public surface — what's gone, what should also have gone
Stripping the ~25 pub use re-exports from ironclaw_reborn/src/lib.rs is the right call and the boundary test pinning it is exactly the discipline I want. Two follow-up concerns:
- Promoting modules to
pub mod(loop_driver_host, model_routes, planned_driver_factory, runtime, app_loop_family, text_loop_driver, milestone_events, model_gateway) is functionally equivalent to keeping the re-exports if the items inside staypub. The follow-up in the PR body (move integration tests to colocated#[cfg(test)]so these can becomepub(crate) mod) needs to be tracked as a real ticket, not a parenthetical. ironclaw_llm::create_registry_providerflipped fromfntopub fnwith a one-line doc. Composition's needs are real, butpubon the registry-provider factory leaks the rig-core dispatch shape to anyone who depends onironclaw_llm. Either gate behind a feature, move into apub(crate)-with-helper-module, or commit to the API stability.
3. Live binary smoke — what is actually verified
run --message "ping" returning nonzero RecoveryRequired without an API key is good negative coverage, and the two new smoke tests pin that. What is not pinned anywhere I can find:
- a happy-path turn that produces an assistant reply through the assembled runtime (the PR body shows it works with a real
OPENAI_API_KEYbut there is no recorded fixture or stub-gateway-with-success test) RebornRuntime::send_user_messageagainst aCompletedrun — the planned-loop harness inironclaw_product_workflow/tests/covers the worker, but not the publicRebornRuntimeAPI the CLI imports- the auto-cancel-on-
RecoveryRequiredbranch inwait_for_terminal(a real product runtime will have a recovery worker; baking "cancel because no recovery worker" into the public runtime API is a footgun)
Add at least one composition-level test that drives build_reborn_runtime → send_user_message → assistant-text-returned with a recording gateway. The harness in support/planned_agent_loop.rs already has the pieces.
4. Profile coverage
build_reborn_runtime only wires LocalDev. Production and migration-dry-run return a clear error, which is correct, but the CLI's RebornProfile::LocalDev check is duplicated in run.rs — once the composition errors clearly, the CLI doesn't need its own match. Minor.
Verdict
Approve once (1) the composition-crate identity question gets a tracking issue, (2) create_registry_provider's public exposure is justified or scoped, and (3) one happy-path test against RebornRuntime::send_user_message lands. Everything else is follow-up.
— Zaki
|
Addressed Zaki's review feedback from #3695 (review) in
Validation:
Note: |
zmanian
left a comment
There was a problem hiding this comment.
Re-review after d80c5e18f (and the prior runtime hardening passes).
Prior findings — status
1. Substrate/binary boundary (composition crate character change) — Resolved-as-tracked. #3726 is open and captures both the composition crate split / feature-gate decision and the follow-up narrowing of ironclaw_reborn public modules. I'm OK deferring the actual split given the live binary is downstream of ironclaw_reborn_composition and the assembled-runtime path is what ironclaw-reborn consumes today; what mattered was that we don't pretend this crate is still "substrate-only." The tracking issue makes the future cost explicit.
2. ironclaw_llm::create_registry_provider leaking rig-core — Resolved. Now gated behind opt-in registry-provider-factory feature; internal call sites use a private create_registry_provider_inner; ironclaw_reborn_composition's root-llm-provider feature transitively opts in. Default builds of ironclaw_llm no longer expose the registry factory. Good outcome.
3. Live-binary happy-path coverage — Resolved. send_user_message_returns_completed_assistant_text_with_recording_gateway drives build_reborn_runtime → new_conversation → send_user_message through a RecordingGateway and asserts TurnStatus::Completed plus the assistant text. The test-only with_model_gateway_override plumbing is #[cfg(test)]-gated and doesn't leak into the public input surface. Combined with the negative-path test that was already there, the runtime ingress is now pinned on both sides.
4. wait_for_terminal auto-cancelling RecoveryRequired — Resolved. 74c9636 makes the cancellation explicit with a comment that the standalone runtime has no recovery worker and must release the conversation lock; the test was updated to assert TurnStatus::Cancelled rather than RecoveryRequired. This is the right behavior for the standalone composition — once a real recovery worker exists, this branch is what will need to change, and the comment flags exactly that.
New observations
runtime.rsbuild_reborn_runtime now has nested#[cfg(test)] / #[cfg(not(test))]inside#[cfg(feature = "root-llm-provider")]arms for the gateway override. It works but it's getting hard to read; if a third axis shows up, extract a smallselect_model_gatewayhelper.- The pre-commit safety findings the author called out (UTF8/TMPDIR/PANIC/PROJECTION) are pre-existing on this branch and not introduced by this PR; not blocking here, but worth folding into a follow-up sweep.
Verdict
Approve. All three prior findings landed cleanly, the deferred split has an issue, and the runtime now has both happy-path and recovery-path coverage. Thanks for the thorough turnaround.
— Zaki
Summary
Promotes
ironclaw_reborn_compositionto the Reborn composition root the crate's own docstring promised, ships a liveironclaw-rebornbinary that boots a real agent through it, and narrowsironclaw_reborn's public surface to a directory of modules — all locked in with boundary tests.Three things land together because doing them apart is harder than doing them in one PR:
ironclaw_reborn_compositionabsorbs the runtime assembly slice (drivers + worker + LLM bridge), per the existing crate-level docstring that already promised "Product/AppBuilder integration belongs in later slices."ironclaw-reborn run -m "..."andironclaw-reborn run(REPL on stdin) boot the composed runtime, submit a turn, wait for completion, print the assistant reply.ironclaw_reborn's ~25-item wall ofpub usere-exports is gone.lib.rsis now a directory of modules. Boundary tests pin the new shape.Why one PR
The cleanup is the point. Shipping "the binary runs" first with the existing noisy surface (plus the new composition entry points layered on top) bakes in the speculative public API problem this PR is trying to solve. Trace audit before starting found:
CapabilityResolveError,LoopCapabilityInputResolver,ModelRoute,DurableLoopHostMilestoneSink, etc.).ironclaw_reborn/tests/) needed import paths updated.That's small enough to do confidently in one change.
Commits
feat(llm): make create_registry_provider public…ironclaw_llmarch(reborn-composition): extend composition root with assembled runtimeironclaw_reborn_compositionbuild_reborn_runtime, +RebornRuntimearch(reborn): narrow ironclaw_reborn public surface to a directory of modulesironclaw_rebornpub uselines; promotes 5 modules topub mod; updates 5 internal tests; fixes 1 baselinecargo test --no-runduplicate-impl bugfeat(reborn-cli): boot a live agent through the composition rootironclaw_reborn_cliironclaw_reborndeparch(boundaries): pin narrowed Reborn surface and composition-routed CLIironclaw_architectureroot-llm-providerdiscipline check, updates one existing assertionWhat the runtime looks like to callers
The CLI imports literally:
That is the entire surface a downstream consumer touches. No
TurnCoordinator,SessionThreadService,HostManagedModelGateway,LoopExitApplier,RebornLoopDriverHostFactory,ThreadScope,SubmitTurnRequest,AcceptedMessageRef,IdempotencyKey, etc. — all of those stay private to the composition root.RebornRuntimeexposes:Verified end-to-end
Without
OPENAI_API_KEY/ANTHROPIC_API_KEY/OLLAMA_BASE_URLexported the run surfacesRecoveryRequired(the stub gateway'sUnavailablepropagated through driver → worker → applier). With a real API key exported the same command prints the assistant reply.Profile coverage
Only
RebornCompositionProfile::LocalDevis wired end-to-end. Production and migration-dry-run continue to return their substrate-onlyRebornServicesvia the unchangedbuild_reborn_services(); callingbuild_reborn_runtimewith them yields a clear "not yet wired" error rather than partially starting an agent. Production wiring is the next slice — durable thread store, durable checkpoint stores, real evidence ports, and a real turn-run wake notifier all already exist in their own crates, but the composition there needs separate review.Tests
ironclaw_llmironclaw_reborn(withroot-llm-provider)ironclaw_reborn_composition(both feature variants)ironclaw_reborn_clirun --dry-run)ironclaw_architectureWorkspace-wide
cargo check --workspace --all-targetsis clean on both--no-default-featuresand default.Drive-by
crates/ironclaw_reborn/src/planned_driver.rshad a duplicateimpl LoopCancellationPort for ResumePayloadHostinsidemod teststhat brokecargo test --no-run -p ironclaw_rebornon baseline. The later (no-opNone-returning) impl was unreachable behind the conflict; removed it and kept the delegating one. Called out explicitly in the commit message.Boundary tests pinning the new shape
reborn_cli_binary_crate_stays_separate_from_v1_root: CLI's exact normal deps are{ironclaw_reborn_composition, ironclaw_reborn_config}and nothing else.ironclaw_reborn_cliBoundaryRule.forbidden: addsironclaw_reborn,ironclaw_llm,ironclaw_loop_support,ironclaw_threads,ironclaw_turns.reborn_loop_support_llm_wiring_stays_out_of_root_src: extended to also assertironclaw_reborn_compositiongatesironclaw_llmbehind the sameroot-llm-providerfeature withoptional = true, default-features = false.reborn_internal_crate_keeps_directory_of_modules_lib_rs(new): forbidspub use ironclaw_loop_support::,pub use loop_driver_host::,pub use milestone_events::,pub use model_gateway::,pub use model_routes::,pub use planned_driver::,pub use planned_driver_factory::,pub use text_loop_driver::,pub use app_loop_family::incrates/ironclaw_reborn/src/lib.rs. Also asserts the runtime assembly lives inironclaw_reborn_compositionand reachesironclaw_rebornonly via module paths.Follow-ups (intentionally not in this PR)
crates/ironclaw_reborn/tests/to#[cfg(test)] mod testsblocks colocated with the modules they exercise. That would let several currently-pub modmodules becomepub(crate) modand tighten the surface further. Out of scope here because the bound is already enforced by the boundary tests and the migration is mechanical.