Force forked agents to inherit parent model settings#16055
Force forked agents to inherit parent model settings#16055friel-openai wants to merge 11 commits intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b0bc4b161
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
8bbc644 to
3379d41
Compare
3379d41 to
fc08138
Compare
7d0d474 to
01c2cff
Compare
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
6b62601 to
e351cad
Compare
|
@jif-oai there are a few issues fixed in this PR:
I think your comment is primarily about the first, is that right? |
jif-oai
left a comment
There was a problem hiding this comment.
I agree this would be good to share more cache here but I don't think this PR fits at the right level.
I would suggest 3 different small PRs:
- Fully re-use the parent config in case of a fork + return a warning or error to the agent if it tries to set anything else
- Shared cache key, do a proper wiring that let us "override" the cache key for a given thread. That we fix at the creation of the thread and gets set directly into the API client
- Make the MCP tools listing stable. Either by definition or using a snapshot
| mcp_connection_manager: Arc::new(RwLock::new(McpConnectionManager::new_uninitialized( | ||
| &config.permissions.approval_policy, | ||
| ))), | ||
| mcp_connection_manager: session_configuration |
There was a problem hiding this comment.
We can't do this. MCP manager get replaced/modified if we have sandbox updates, elicitation, ...
In general, we can't share the manager without re-designing it
If I get it correctly, the only reason we do this is because we want a stability over built_tools. First, this shouldn't drift most of the time. Second, we should use a snapshot around it to fix the issue then
| let inherited_exec_policy = self | ||
| .inherited_exec_policy_for_source(&state, Some(&session_source), &config) | ||
| .await; | ||
| let inherited_prompt_cache_key = self |
There was a problem hiding this comment.
This resume path should not opportunistically inherit parent cache/MCP state from a live parent. SubAgentSource::ThreadSpawn does not persist whether this child was originally forked, so the same resumed agent will get parent-shared cache/MCP only when the parent is currently loaded, and isolated state otherwise... we can't have such non-determinism
| inherited_shell_snapshot: None, | ||
| user_shell_override: None, | ||
| inherited_exec_policy: Some(Arc::clone(&parent_session.services.exec_policy)), | ||
| inherited_prompt_cache_key: Some(parent_session.prompt_cache_key()), |
| @@ -770,6 +773,8 @@ impl ThreadManagerState { | |||
| metrics_service_name: Option<String>, | |||
| inherited_shell_snapshot: Option<Arc<ShellSnapshot>>, | |||
There was a problem hiding this comment.
this is a lot of optional inherited things. We should have a small builder pattern for the inherited things
| @@ -225,7 +225,11 @@ fn build_agent_shared_config(turn: &TurnContext) -> Result<Config, FunctionCallE | |||
| let mut config = (*base_config).clone(); | |||
There was a problem hiding this comment.
All of this should be cleared IMO
We should just re-use the full same config... otherwise this will become brittle over time
| config.model = Some(turn.model_info.slug.clone()); | ||
| config.model_provider = turn.provider.clone(); | ||
| config.model_reasoning_effort = turn.reasoning_effort; | ||
| // Forked children must preserve the spawning turn's effective model settings, including a |
There was a problem hiding this comment.
this code path is not only for forked agents
| pub(crate) inherited_shell_snapshot: Option<Arc<ShellSnapshot>>, | ||
| pub(crate) inherited_exec_policy: Option<Arc<ExecPolicyManager>>, | ||
| pub(crate) inherited_prompt_cache_key: Option<ThreadId>, | ||
| pub(crate) inherited_mcp_connection_manager: Option<Arc<RwLock<McpConnectionManager>>>, |
There was a problem hiding this comment.
Up Option of an Arc of RwLock looks like a code smell in Rust... there are better alternatives for hot swap shared references
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn spawn_agent_fork_context_ignores_child_model_overrides() { |
There was a problem hiding this comment.
this test is kind of pointless IMO
A better test would be an integration test that makes sure the full context is the same at the API level
|
|
||
| let mut config = (*turn.config).clone(); | ||
| let mut role_provider = | ||
| built_in_model_providers(/* openai_base_url */ /*openai_base_url*/ None)["openai"].clone(); |
| args.reasoning_effort, | ||
| ) | ||
| .await?; | ||
| if !args.fork_context { |
There was a problem hiding this comment.
we should probably error to the model if it specify fork with a model and reasoning effort. Or at least emit a warning
## Summary When a `spawn_agent` call does a full-history fork, keep the parent's effective agent type and model configuration instead of applying child role/model overrides. This is the minimal config-inheritance slice of #16055. Prompt-cache key inheritance and MCP tool-surface stability are split into follow-up PRs. ## Design - Reject `agent_type`, `model`, and `reasoning_effort` for v1 `fork_context` spawns. - Reject `agent_type`, `model`, and `reasoning_effort` for v2 `fork_turns = "all"` spawns. - Keep v2 partial-history forks (`fork_turns = "N"`) configurable; requested model/reasoning overrides and role config still apply there. - Keep non-forked spawn behavior unchanged. ## Tests - `cargo +1.93.1 test -p codex-core spawn_agent_fork_context --lib` - `cargo +1.93.1 test -p codex-core multi_agent_v2_spawn_fork_turns --lib` - `cargo +1.93.1 test -p codex-core multi_agent_v2_spawn_partial_fork_turns_allows_agent_type_override --lib`
* Mirror user text into realtime (openai#17520) - Let typed user messages submit while realtime is active and mirror accepted text into the realtime text stream. - Add integration coverage and snapshot for outbound realtime text. * feat(tui): add reverse history search to composer (openai#17550) ## Problem The TUI had shell-style Up/Down history recall, but `Ctrl+R` did not provide the reverse incremental search workflow users expect from shells. Users needed a way to search older prompts without immediately replacing the current draft, and the interaction needed to handle async persistent history, repeated navigation keys, duplicate prompt text, footer hints, and preview highlighting without making the main composer file even harder to review. https://github.com/user-attachments/assets/5165affd-4c9a-46e9-adbd-89088f5f7b6b <img width="1227" height="722" alt="image" src="https://github.com/user-attachments/assets/8bc83289-eeca-47c7-b0c3-8975101901af" /> ## Mental model `Ctrl+R` opens a temporary search session owned by the composer. The footer line becomes the search input, the composer body previews the current match only after the query has text, and `Enter` accepts that preview as an editable draft while `Esc` restores the draft that existed before search started. The history layer provides a combined offset space over persistent and local history, but search navigation exposes unique prompt text rather than every physical history row. ## Non-goals This change does not rewrite stored history, change normal Up/Down browsing semantics, add fuzzy matching, or add persistent metadata for attachments in cross-session history. Search deduplication is deliberately scoped to the active Ctrl+R search session and uses exact prompt text, so case, whitespace, punctuation, and attachment-only differences are not normalized. ## Tradeoffs The implementation keeps search state in the existing composer and history state machines instead of adding a new cross-module controller. That keeps ownership local and testable, but it means the composer still coordinates visible search status, draft restoration, footer rendering, cursor placement, and match highlighting while `ChatComposerHistory` owns traversal, async fetch continuation, boundary clamping, and unique-result caching. Unique-result caching stores cloned `HistoryEntry` values so known matches can be revisited without cache lookups; this is simple and robust for interactive search sizes, but it is not a global history index. ## Architecture `ChatComposer` detects `Ctrl+R`, snapshots the current draft, switches the footer to `FooterMode::HistorySearch`, and routes search-mode keys before normal editing. Query edits call `ChatComposerHistory::search` with `restart = true`, which starts from the newest combined-history offset. Repeated `Ctrl+R` or Up searches older; Down searches newer through already discovered unique matches or continues the scan. Persistent history entries still arrive asynchronously through `on_entry_response`, where a pending search either accepts the response, skips a duplicate, or requests the next offset. The composer-facing pieces now live in `codex-rs/tui/src/bottom_pane/chat_composer/history_search.rs`, leaving `chat_composer.rs` responsible for routing and rendering integration instead of owning every search helper inline. `codex-rs/tui/src/bottom_pane/chat_composer_history.rs` remains the owner of stored history, combined offsets, async fetch state, boundary semantics, and duplicate suppression. Match highlighting is computed from the current composer text while search is active and disappears when the match is accepted. ## Observability There are no new logs or telemetry. The practical debug path is state inspection: `ChatComposer.history_search` tells whether the footer query is idle, searching, matched, or unmatched; `ChatComposerHistory.search` tracks selected raw offsets, pending persistent fetches, exhausted directions, and unique match cache state. If a user reports skipped or repeated results, first inspect the exact stored prompt text, the selected offset, whether an async persistent response is still pending, and whether a query edit restarted the search session. ## Tests The change is covered by focused `codex-tui` unit tests for opening search without previewing the latest entry, accepting and canceling search, no-match restoration, boundary clamping, footer hints, case-insensitive highlighting, local duplicate skipping, and persistent duplicate skipping through async responses. Snapshot coverage captures the footer-mode visual changes. Local verification used `just fmt`, `cargo test -p codex-tui history_search`, `cargo test -p codex-tui`, and `just fix -p codex-tui`. * Remove context status-line meter (openai#17420) Addresses openai#17313 Problem: The visual context meter in the status line was confusing and continued to draw negative feedback, and context reporting should remain an explicit opt-in rather than part of the default footer. Solution: Remove the visual meter, restore opt-in context remaining/used percentage items that explicitly say "Context", keep existing context-usage configs working as a hidden alias, and update the setup text and snapshots. * Expose instruction sources (AGENTS.md) via app server (openai#17506) Addresses openai#17498 Problem: The TUI derived /status instruction source paths from the local client environment, which could show stale <none> output or incorrect paths when connected to a remote app server. Solution: Add an app-server v2 instructionSources snapshot to thread start/resume/fork responses, default it to an empty list when older servers omit it, and render TUI /status from that server-provided session data. Additional context: The app-server field is intentionally named instructionSources rather than AGENTS.md-specific terminology because the loaded instruction sources can include global instructions, project AGENTS.md files, AGENTS.override.md, user-defined instruction files, and future dynamic sources. * fix(mcp) pause timer for elicitations (openai#17566) ## Summary Stop counting elicitation time towards mcp tool call time. There are some tradeoffs here, but in general I don't think time spent waiting for elicitations should count towards tool call time, or at least not directly towards timeouts. Elicitations are not exactly like exec_command escalation requests, but I would argue it's ~roughly equivalent. ## Testing - [x] Added unit tests - [x] Tested locally * Add MCP tool wall time to model output (openai#17406) Include MCP wall time in the output so the model is aware of how long it's calls are taking. * Run exec-server fs operations through sandbox helper (openai#17294) ## Summary - run exec-server filesystem RPCs requiring sandboxing through a `codex-fs` arg0 helper over stdin/stdout - keep direct local filesystem execution for `DangerFullAccess` and external sandbox policies - remove the standalone exec-server binary path in favor of top-level arg0 dispatch/runtime paths - add sandbox escape regression coverage for local and remote filesystem paths ## Validation - `just fmt` - `git diff --check` - remote devbox: `cd codex-rs && bazel test --bes_backend= --bes_results_url= //codex-rs/exec-server:all` (6/6 passed) --------- Co-authored-by: Codex <noreply@openai.com> * Stabilize exec-server process tests (openai#17605) Problem: After openai#17294 switched exec-server tests to launch the top-level `codex exec-server` command, parallel remote exec-process cases can flake while waiting for the child server's listen URL or transport shutdown. Solution: Serialize remote exec-server-backed process tests and harden the harness so spawned servers are killed on drop and shutdown waits for the child process to exit. * feat: ignore keyring on 0.0.0 (openai#17221) To prevent the spammy: <img width="424" height="172" alt="Screenshot 2026-04-09 at 13 36 16" src="https://github.com/user-attachments/assets/b5ece9e3-c561-422f-87ec-041e7bd6813d" /> * Build remote exec env from exec-server policy (openai#17216) ## Summary - add an exec-server `envPolicy` field; when present, the server starts from its own process env and applies the shell environment policy there - keep `env` as the exact environment for local/embedded starts, but make it an overlay for remote unified-exec starts - move the shell-environment-policy builder into `codex-config` so Core and exec-server share the inherit/filter/set/include behavior - overlay only runtime/sandbox/network deltas from Core onto the exec-server-derived env ## Why Remote unified exec was materializing the shell env inside Core and forwarding the whole map to exec-server, so remote processes could inherit the orchestrator machine's `HOME`, `PATH`, etc. This keeps the base env on the executor while preserving Core-owned runtime additions like `CODEX_THREAD_ID`, unified-exec defaults, network proxy env, and sandbox marker env. ## Validation - `just fmt` - `git diff --check` - `cargo test -p codex-exec-server --lib` - `cargo test -p codex-core --lib unified_exec::process_manager::tests` - `cargo test -p codex-core --lib exec_env::tests` - `cargo test -p codex-core --lib exec_env_tests` (compile-only; filter matched 0 tests) - `cargo test -p codex-config --lib shell_environment` (compile-only; filter matched 0 tests) - `just bazel-lock-update` ## Known local validation issue - `just bazel-lock-check` is not runnable in this checkout: it invokes `./scripts/check-module-bazel-lock.sh`, which is missing. --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: pakrym-oai <pakrym@openai.com> * nit: change consolidation model (openai#17633) * fix: stability exec server (openai#17640) * fix: dedup compact (openai#17643) * Make forked agent spawns keep parent model config (openai#17247) ## Summary When a `spawn_agent` call does a full-history fork, keep the parent's effective agent type and model configuration instead of applying child role/model overrides. This is the minimal config-inheritance slice of openai#16055. Prompt-cache key inheritance and MCP tool-surface stability are split into follow-up PRs. ## Design - Reject `agent_type`, `model`, and `reasoning_effort` for v1 `fork_context` spawns. - Reject `agent_type`, `model`, and `reasoning_effort` for v2 `fork_turns = "all"` spawns. - Keep v2 partial-history forks (`fork_turns = "N"`) configurable; requested model/reasoning overrides and role config still apply there. - Keep non-forked spawn behavior unchanged. ## Tests - `cargo +1.93.1 test -p codex-core spawn_agent_fork_context --lib` - `cargo +1.93.1 test -p codex-core multi_agent_v2_spawn_fork_turns --lib` - `cargo +1.93.1 test -p codex-core multi_agent_v2_spawn_partial_fork_turns_allows_agent_type_override --lib` * Fix custom tool output cleanup on stream failure (openai#17470) Addresses openai#16255 Problem: Incomplete Responses streams could leave completed custom tool outputs out of cleanup and retry prompts, making persisted history inconsistent and retries stale. Solution: Route stream and output-item errors through shared cleanup, and rebuild retry prompts from fresh session history after the first attempt. * Emit plan-mode prompt notifications for questionnaires (openai#17417) Addresses openai#17252 Problem: Plan-mode clarification questionnaires used the generic user-input notification type, so configs listening for plan-mode-prompt did not fire when request_user_input waited for an answer. Solution: Map request_user_input prompts to the plan-mode-prompt notification and remove the obsolete user-input TUI notification variant. * Wrap status reset timestamps in narrow layouts (openai#17481) Addresses openai#17453 Problem: /status rate-limit reset timestamps can be truncated in narrow layouts, leaving users with partial times or dates. Solution: Let narrow rate-limit rows drop the fixed progress bar to preserve the percent summary, and wrap reset timestamps onto continuation lines instead of truncating them. * Suppress duplicate compaction and terminal wait events (openai#17601) Addresses openai#17514 Problem: PR openai#16966 made the TUI render the deprecated context-compaction notification, while v2 could also receive legacy unified-exec interaction items alongside terminal-interaction notifications, causing duplicate "Context compacted" and "Waited for background terminal" messages. Solution: Suppress deprecated context-compaction notifications and legacy unified-exec interaction command items from the app-server v2 projection, and render canonical context-compaction items through the existing TUI info-event path. * Fix TUI compaction item replay (openai#17657) Problem: PR openai#17601 updated context-compaction replay to call a new ChatWidget handler, but the handler was never implemented, breaking codex-tui compilation on main. Solution: Render context-compaction replay through the existing info-message path, preserving the intended `Context compacted` UI marker without adding a one-off handler. * Do not fail thread start when trust persistence fails (openai#17595) Addresses openai#17593 Problem: A regression introduced in openai#16492 made thread/start fail when Codex could not persist trusted project state, which crashes startup for users with read-only config.toml. Solution: Treat trusted project persistence as best effort and keep the current thread's config trusted in memory when writing config.toml fails. * Use AbsolutePathBuf in skill loading and codex_home (openai#17407) Helps with FS migration later * feat: disable memory endpoint (openai#17626) * Include legacy deny paths in elevated Windows sandbox setup (openai#17365) ## Summary This updates the Windows elevated sandbox setup/refresh path to include the legacy `compute_allow_paths(...).deny` protected children in the same deny-write payload pipe added for split filesystem carveouts. Concretely, elevated setup and elevated refresh now both build deny-write payload paths from: - explicit split-policy deny-write paths, preserving missing paths so setup can materialize them before applying ACLs - legacy `compute_allow_paths(...).deny`, which includes existing `.git`, `.codex`, and `.agents` children under writable roots This lets the elevated backend protect `.git` consistently with the unelevated/restricted-token path, and removes the old janky hard-coded `.codex` / `.agents` elevated setup helpers in favor of the shared payload path. ## Root Cause The landed split-carveout PR threaded a `deny_write_paths` pipe through elevated setup/refresh, but the legacy workspace-write deny set from `compute_allow_paths(...).deny` was not included in that payload. As a result, elevated workspace-write did not apply the intended deny-write ACLs for existing protected children like `<cwd>/.git`. ## Notes The legacy protected children still only enter the deny set if they already exist, because `compute_allow_paths` filters `.git`, `.codex`, and `.agents` with `exists()`. Missing explicit split-policy deny paths are preserved separately because setup intentionally materializes those before applying ACLs. ## Validation - `cargo fmt --check -p codex-windows-sandbox` - `cargo test -p codex-windows-sandbox` - `cargo build -p codex-cli -p codex-windows-sandbox --bins` - Elevated `codex exec` smoke with `windows.sandbox='elevated'`: fresh git repo, attempted append to `.git/config`, observed `Access is denied`, marker not written, Deny ACE present on `.git` - Unelevated `codex exec` smoke with `windows.sandbox='unelevated'`: fresh git repo, attempted append to `.git/config`, observed `Access is denied`, marker not written, Deny ACE present on `.git` * feat: Avoid reloading curated marketplaces for tool-suggest discovera… (openai#17638) - stop `list_tool_suggest_discoverable_plugins()` from reloading the curated marketplace for each discoverable plugin - reuse a direct plugin-detail loader against the already-resolved marketplace entry The trigger was to stop those logs spamming: ``` d=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/life-science-research/.codex-plugin/plugin.json 2026-04-13T12:27:30.402Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/build-ios-apps/.codex-plugin/plugin.json 2026-04-13T12:27:30.402Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/life-science-research/.codex-plugin/plugin.json 2026-04-13T12:27:30.405Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/build-ios-apps/.codex-plugin/plugin.json 2026-04-13T12:27:30.406Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/life-science-research/.codex-plugin/plugin.json 2026-04-13T12:27:30.408Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/build-ios-apps/.codex-plugin/plugin.json ``` * app-server: Only unload threads which were unused for some time (openai#17398) Currently app-server may unload actively running threads once the last connection disconnects, which is not expected. Instead track when was the last active turn & when there were any subscribers the last time, also add 30 minute idleness/no subscribers timer to reduce the churn. * only specify remote ports when the rule needs them (openai#17669) Windows gives an error when you combine `protocol = ANY` with `SetRemotePorts` This fixes that * Fix tui compilation (openai#17691) The recent release broke, codex suggested this as the fix Source failure: https://github.com/openai/codex/actions/runs/24362949066/job/71147202092 Probably from openai@ac82443 For why it got in: ``` The relevant setup: .github/workflows/rust-ci.yml (line 1) runs on PRs, but for codex-rs it only does: cargo fmt --check cargo shear argument-comment lint via Bazel no cargo check, no cargo clippy over the workspace, no cargo test over codex-tui .github/workflows/rust-ci-full.yml (line 1) runs on pushes to main and branches matching **full-ci**. That one does compile TUI because: codex-rs/Cargo.toml includes "tui" as a workspace member lint_build runs cargo clippy --target ... --tests --profile ... the matrix includes both dev and release profiles tests runs cargo nextest run ..., but only dev-profile tests Release CI also compiles it indirectly. .github/workflows/rust-release.yml (line 235) builds --bin codex, and cli/Cargo.toml (line 46) depends on codex-tui. ``` Codex tested locally with `cargo check -p codex-tui --release` and was able to repro, and verified that this fixed it * Update phase 2 memory model to gpt-5.4 (openai#17384) ### Motivation - Switch the default model used for memory Phase 2 (consolidation) to the newer `gpt-5.4` model. ### Description - Change the Phase 2 model constant from `"gpt-5.3-codex"` to `"gpt-5.4"` in `codex-rs/core/src/memories/mod.rs`. ### Testing - Ran `just fmt`, which completed successfully. - Attempted `cargo test -p codex-core`, but the build failed in this environment because the `codex-linux-sandbox` crate requires the system `libcap` pkg-config entry and the required system packages could not be installed, so the test run was blocked. ------ [Codex Task](https://chatgpt.com/codex/cloud/tasks/task_i_69d977693b48832a967e78d73c66dc8e) * Remove unnecessary tests (openai#17395) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request. * Cap realtime mirrored user turns (openai#17685) Cap mirrored user text sent to realtime with the existing 300-token turn budget while preserving the full model turn. Adds integration coverage for capped realtime mirror payloads. --------- Co-authored-by: Codex <noreply@openai.com> * change realtime tool description (openai#17699) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request. * Add `supports_parallel_tool_calls` flag to included mcps (openai#17667) ## Why For more advanced MCP usage, we want the model to be able to emit parallel MCP tool calls and have Codex execute eligible ones concurrently, instead of forcing all MCP calls through the serial block. The main design choice was where to thread the config. I made this server-level because parallel safety depends on the MCP server implementation. Codex reads the flag from `mcp_servers`, threads the opted-in server names into `ToolRouter`, and checks the parsed `ToolPayload::Mcp { server, .. }` at execution time. That avoids relying on model-visible tool names, which can be incomplete in deferred/search-tool paths or ambiguous for similarly named servers/tools. ## What was added Added `supports_parallel_tool_calls` for MCP servers. Before: ```toml [mcp_servers.docs] command = "docs-server" ``` After: ```toml [mcp_servers.docs] command = "docs-server" supports_parallel_tool_calls = true ``` MCP calls remain serial by default. Only tools from opted-in servers are eligible to run in parallel. Docs also now warn to enable this only when the server’s tools are safe to run concurrently, especially around shared state or read/write races. ## Testing Tested with a local stdio MCP server exposing real delay tools. The model/Responses side was mocked only to deterministically emit two MCP calls in the same turn. Each test called `query_with_delay` and `query_with_delay_2` with `{ "seconds": 25 }`. | Build/config | Observed | Wall time | | --- | --- | --- | | main with flag enabled | serial | `58.79s` | | PR with flag enabled | parallel | `31.73s` | | PR without flag | serial | `56.70s` | PR with flag enabled showed both tools start before either completed; main and PR-without-flag completed the first delay before starting the second. Also added an integration test. Additional checks: - `cargo test -p codex-tools` passed - `cargo test -p codex-core mcp_parallel_support_uses_exact_payload_server` passed - `git diff --check` passed --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com> Co-authored-by: Felipe Coury <felipe.coury@openai.com> Co-authored-by: Eric Traut <etraut@openai.com> Co-authored-by: Dylan Hurd <dylan.hurd@openai.com> Co-authored-by: pakrym-oai <pakrym@openai.com> Co-authored-by: starr-openai <starr@openai.com> Co-authored-by: Codex <noreply@openai.com> Co-authored-by: jif-oai <jif@openai.com> Co-authored-by: friel-openai <friel@openai.com> Co-authored-by: iceweasel-oai <iceweasel@openai.com> Co-authored-by: Ruslan Nigmatullin <ruslan@openai.com> Co-authored-by: David Z Hao <david.hao@openai.com> Co-authored-by: Kevin Liu <kevin@kliu.io> Co-authored-by: josiah-openai <josiah@openai.com>
Summary
fork_context = true/fork_turns) ignore childmodelandreasoning_effortoverridesTest
cargo +1.93.1 test -p codex-core spawn_agent_fork_context_ignores_child_model_overrides --libcargo +1.93.1 test -p codex-core multi_agent_v2_spawn_fork_turns_ignores_child_model_overrides --libcargo +1.93.1 test -p codex-core spawn_agent_can_fork_parent_thread_history_with_sanitized_items --lib