feat(sandboxes): Improve code evaluator editor ux by cephalization · Pull Request #12690 · Arize-ai/phoenix

cephalization · 2026-04-15T16:43:21Z

No description provided.

Align wasm_backend.py, e2b_backend.py, and test_wasm_backend.py with version-sandboxes branch. The changes on this branch introduced type errors by using wrong type ignore codes and adding unnecessary helper functions.

claude · 2026-04-17T19:20:37Z

+                    type_name, sandbox_config_db_id = from_global_id(
+                        inline_code_evaluator.sandbox_config_id
+                    )


The type_name is extracted here but never validated — it's silently discarded. A client could pass a GlobalID with any type prefix (e.g., GlobalID("SomeOtherType", "5")) and the code would still look up SandboxConfig row with id=5, bypassing the type safety that GlobalIDs are meant to provide.

This is inconsistent with the same PR's handling in evaluator_mutations.py, which correctly uses from_global_id_with_expected_type:

phoenix/src/phoenix/server/api/mutations/evaluator_mutations.py

Lines 384 to 387 in 95c79e1

output_configs=output_configs,

input_mapping=input.input_mapping.to_orm()

if input.input_mapping is not None

else InputMapping(literal_mapping={}, path_mapping={}),

The fix is to replace from_global_id with from_global_id_with_expected_type and validate against SandboxConfig.__name__ (you'll also need to ensure from_global_id_with_expected_type and SandboxConfig are imported in this file):

sandbox_config_db_id = from_global_id_with_expected_type( inline_code_evaluator.sandbox_config_id, SandboxConfig.__name__ )

anticorrelator

Error Handling & User Feedback

app/src/components/evaluators/CodeEvaluatorLanguageSandboxFields.tsx:118 — silent null save unlinks the sandbox in DB; next scheduled run fails with an error the user cannot explain
src/phoenix/server/api/mutations/chat_mutations.py:4318 — five distinct misconfigurations produce the same 'please select' message; users whose provider was disabled cannot self-diagnose
src/phoenix/server/api/mutations/chat_mutations.py:4286 — misrouted GlobalIDs produce less-specific errors and skip type validation used consistently in every other evaluator mutation

Schema & Validation Consistency

src/phoenix/server/api/evaluators.py:2422 — dataset examples with partial fields fail apply_input_mapping even though JS destructuring handles missing keys gracefully; schema and TS footer contradict each other
app/src/components/evaluators/codeEvaluatorUtils.ts:52 — code the editor's autocomplete promotes is rejected at runtime; editor and backend accept different function-parameter grammar

UI Regression & Accessibility

app/src/components/evaluators/EditCodeEvaluatorDialogContent.tsx:412 — existing descriptions survive Save but cannot be edited; field was silently dropped from the new header layout
app/src/components/evaluators/EditCodeEvaluatorDialogContent.tsx:720 — screen readers announce the remove button as just 'button'; it is the only unlabeled destructive control in the editor (same gap in CodeEvaluatorTestSection.tsx)
app/src/components/dataset/CreateCodeDatasetEvaluatorSlideover.tsx:268 — fresh-install users cannot draft any evaluator code; the empty-sandbox gate hides the entire editor form

Test Coverage

src/phoenix/server/api/mutations/chat_mutations.py:281 — five guard paths and the happy path have no resolver tests; regressions in inline_code_evaluator ship without any CI signal

Additional comments

src/phoenix/server/api/mutations/chat_mutations.py:4286 — The inline branch decodes the sandbox GlobalID with from_global_id(...) and discards type_name, diverging from every other branch of evaluator_previews and from the sister create_code_evaluator / update_code_evaluator mutations edited in this same PR — all of which use from_global_id_with_expected_type(..., SandboxConfig.__name__). No security boundary is crossed today (SandboxConfig has no owner column, mutation is auth-gated) but the inconsistency is a defense-in-depth gap and produces less-specific errors for misrouted IDs. Replace with the expected-type helper.
src/phoenix/server/api/mutations/chat_mutations.py:4318 — sandbox_backend is None funnels five distinct misconfigurations (none selected, config missing, config disabled, provider missing, provider disabled) into the same BadRequest("No sandbox backend configured for language 'X'..."). Users who did select a sandbox and whose provider was disabled out from under them see 'please select a sandbox configuration' even though they already have. The language-mismatch case gets a dedicated error mid-block — extend the same treatment to the other branches so users can self-diagnose.
app/src/components/evaluators/codeEvaluatorUtils.ts:52 — The editor-side extractTypeScriptVariables accepts non-destructured shapes like function evaluate(ctx) { return ctx.output... } and auto-populates mappings from property access. The backend _infer_typescript_evaluate_input_schema in src/phoenix/server/api/evaluators.py:~2445 rejects that same signature with 'Use a destructured object parameter'. The editor now guides users into writing code the runtime refuses. Either share a grammar between the two, or have the UI consume the server's schema/validation result.

anticorrelator · 2026-04-17T21:33:40Z

+  return (
+    <View>
+      <ComboBox
+        label="Sandbox"


When mapSandboxConfigOptions drops the provider/config for a saved evaluator (provider disabled, backend unavailable), the Edit dialog falls through to selectedSandboxConfigId = null with no user-visible signal. On Save, { sandboxConfigId: null } is sent and evaluator_mutations.py treats the null as an explicit clear — silently unlinking the sandbox in the DB. The next scheduled run will fail with 'Please select a sandbox configuration'. Surface the invalidated selection (inline warning + keep the ID in a disabled/informational state) or omit sandboxConfigId from the mutation when the user didn't touch it.

anticorrelator · 2026-04-17T21:33:40Z

+/**
+ * Compact header bar with inline name, language, and sandbox fields
+ */
+const CompactHeaderBar = ({


The new CompactHeaderBar drops EvaluatorDescriptionInput. The store and GraphQL mutations still carry description (edit hydrates it, create initializes to ""), so existing values round-trip on Save but cannot be edited from the UI — a regression relative to the prior layout, which rendered a description field next to the name. Restore the description input (inline in the header or inside a disclosure in the sidebar).

anticorrelator · 2026-04-17T21:33:40Z

-              updateValues(
-                values.filter((_, valueIndex) => valueIndex !== index)
-              );
+              if (values.length <= 2) return;


The categorical-choice remove button renders only <Icon svg={<Icons.TrashOutline />} /> with no aria-label or visually-hidden label. Screen readers announce 'button' with no action hint — and the destructive remove is now the only unlabeled control in the editor while + Add choice retains its label. Add aria-label="Remove choice". The result-card close IconButton in CodeEvaluatorTestSection.tsx has the same problem.

anticorrelator · 2026-04-17T21:33:40Z

+_TYPESCRIPT_ARROW_SIGNATURE_RE = re.compile(r"(?:const|let|var)\s+evaluate\s*=\s*\(([^)]*)\)\s*=>")
+
+
+def _extract_typescript_object_parameter_keys(params: str) -> tuple[list[str], list[str]]:


_extract_typescript_object_parameter_keys marks every destructured key required unless it has = or ?. The default template function evaluate({ output, reference, input, metadata }: EvaluatorParams) (shipped in DEFAULT_CODE_EVALUATOR_SOURCE) therefore produces an input schema where all four are required, while the generated type footer renders them as optional (output?: Output; reference?: Reference; …). A dataset example containing only output + reference can fail apply_input_mapping validation even though JS destructuring of a missing key just yields undefined. Treat bare destructured names as optional at the schema level, or honor TS ? / default annotations only.

anticorrelator · 2026-04-17T21:33:40Z

                for eval_result in eval_results:
                    all_results.append(_to_evaluation_result(eval_result, eval_result["name"]))

+            elif inline_code_evaluator := evaluator_input.inline_code_evaluator:


The ~80-LOC inline_code_evaluator branch has no resolver tests. Five BadRequest paths (sandbox_config_id None, config missing, disabled config, disabled provider, language mismatch) plus the happy path are unverified — test_code_evaluator_sandbox_mutations.py covers create/update/stored-preview flows, test_code_evaluator_runner.py covers the runner in isolation, and Playwright only hits one happy path indirectly. Add a resolver test per guard plus a success-path test; a regression that swallows any guard into a 500 or a silent pass currently ships without CI signal.

anticorrelator · 2026-04-17T21:33:40Z

    });
  };

+  // Show empty state if no sandbox configs are available


The sandboxConfigs.length === 0 short-circuit renders only a 'Configure Sandbox' CTA — the user cannot see the editor, write code, or configure outputs. The backend still accepts a null sandbox_config_id on create (evaluator_mutations.py:~1187), so this is a UI-only gate. First-time users on a fresh install cannot draft any work. Either render the form with a disabled sandbox selector and inline banner, or explicitly confirm this is an intentional guard-rail (then mirror the same behavior in the edit flow for consistency).

claude · 2026-04-20T15:41:27Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

…sync factory Base PR #12690 added _resolve_inline_code_evaluator_backend with a sync call to get_or_create_backend. Our branch made that factory async with required session/decrypt params for DB-backed credential resolution, so the call site needs the same conversion applied at the other two sites: await, session+decrypt, MissingSecretError/UnsupportedOperation/ ValidationError mapping to BadRequest, and admin-provider-wins merge order.

…12655) * feat(sandbox): inject user-defined env vars and secret refs into sandbox execution Extends per-adapter config schemas with env_vars (literal values or references to the encrypted Secret table), internet_access, and dependencies fields. Wires resolution through get_or_create_backend and forwards merged env to all seven adapters. Adds SandboxConfigDialog UI for managing entries with a secret picker, capability advertisement per adapter, and end-to-end tests. * fix(sandbox): align test adapter signatures and ruff formatting Add user_env parameter to test-only SandboxAdapter subclasses to match the updated base signature, and apply ruff formatting to env_var_end_to_end. * fix(sandbox): unpack packages list in daytona pip install command Generated subprocess.run args were embedding self._packages as a nested list literal, which raises TypeError at runtime. Use * unpacking so each package becomes its own argv element. Reported by Claude Code Review on PR #12655. * feat(sandbox): resolve provider credentials via DB secrets + env fallback with GQL mutations Adds 2-tier credential resolution (DB Secret → env var) for sandbox provider authentication, centralized in `get_or_create_backend()` so adapters stay credential-agnostic. Introduces `EnvVarSpec` as the declarative contract between the factory and each adapter, and exposes `setSandboxCredential` / `deleteSandboxCredential` GraphQL mutations (with `env_var_specs` validation and `_BACKEND_CACHE` invalidation) so admins can rotate sandbox credentials via the UI without a server restart. * feat(sandbox): unified config contract — fail-closed runtime + preserve-on-save UI Lock the unified env_vars/dependencies/internet_access contract across all sandbox adapters. AdapterMetadata is the canonical source of capability advertisement; build_backend MUST raise UnsupportedOperation for any capability-gated input the adapter cannot fulfill (no silent drops). Backend reconciliation: - MODAL: flip supports_env_vars=True (runtime already worked); make ModalSandboxBackend.execute() raise UnsupportedOperation for per-call env overrides (was a silent warning); session-level user_env preserved - E2B/Vercel-Py/Vercel-TS/Modal: reject non-empty `dependencies` in build_backend - E2B/Daytona/Modal/Vercel-Py/Vercel-TS/Deno: reject non-"none" `internet_access` in build_backend - VercelPython/VercelTypescript/Deno configs: add Optional internet_access field for shape uniformity Frontend (SandboxConfigDialog): - formValuesToConfigPatch (extracted to utils.tsx for testability): deep-merge form state into a clone of the raw stored config; preserves capability-gated keys when flag is false or activeBackend is undefined, closing the False→True→False data-loss path - Advanced Config JSON editor explicitly strips capability-gated keys on submit; structured editors are the only authors - Hidden capability sections render a single muted "Not supported by the selected backend." placeholder instead of being omitted - ALLOWLIST internet_access mode marked reserved-for-future Tests: - New parametrized test_unified_config_contract.py iterates over SANDBOX_ADAPTER_METADATA.keys() and enforces flag/config-model/runtime agreement for all three capabilities; new adapters must opt-in to pass - test_capability_advertisement.py and test_queries.py refactored to parametrize from the same registry source — eliminates duplicate hard-coded matrices - test_user_env_forwarding.py covers MODAL's new execute-time policy - New formValuesToConfigPatch.test.ts covers the three preserve-on-save data-loss scenarios Documents the contract on AdapterMetadata with cross-references from SandboxAdapter.build_backend and validate_config so future adapters inherit the rules from source. * fix(sandbox): enforce dependencies/internet_access guards for DENO and WASM The unified config contract tests expect adapters with dependencies_language=None or internet_access="none" in SANDBOX_ADAPTER_METADATA to raise UnsupportedOperation when build_backend is called with a conflicting config. DENO was missing the packages guard; WASM was missing both packages and internet_access guards. Mirrors the existing pattern in e2b/modal/vercel backends. * feat(sandbox): enforce capability gates and reserved credential names in unified config - Rename EnvVarSpec → ProviderCredentialSpec and AdapterMetadata.internet_access → internet_access_capability to disambiguate adapter capability flags from the runtime config block. - validate_config now enforces capability gates (env_vars, internet_access, dependencies) and unique env_var names at write time rather than relying on per-adapter build_backend guards alone. - Execute-code path: drop per-call env override; user env is carried by the adapter from build_backend() for the session lifetime. - Reserved-name enforcement is case-insensitive and now covers both config.env_vars entries and top-level config keys, preventing credential shadowing via config's extra="allow" schema. - setSecrets invalidates backend caches for any SandboxConfig whose env_vars reference a rotated or deleted secret_key. - Add tests for reserved credential names and sandbox cache invalidation on secret rotation. * fix(sandbox): convert rebased inline-evaluator preview call site to async factory Base PR #12690 added _resolve_inline_code_evaluator_backend with a sync call to get_or_create_backend. Our branch made that factory async with required session/decrypt params for DB-backed credential resolution, so the call site needs the same conversion applied at the other two sites: await, session+decrypt, MissingSecretError/UnsupportedOperation/ ValidationError mapping to BadRequest, and admin-provider-wins merge order. * fix(sandbox): map invalid sandbox_config_id to BadRequest in preview mutation from_global_id_with_expected_type raises ValueError on type mismatch; strawberry swallows that as "an unexpected error occurred", which fails test_rejects_wrong_global_id_type's message assertion. Wrap the call at the preview site to surface the real error to the client. * fix(sandbox): PR #12655 review followups — capability contract polish B1 (blocker): widen update_sandbox_config to catch ValidationError → BadRequest, matching create_sandbox_config; capability-gate violations no longer escape as 500s. I1: relax update-path capability gate to semantically-changed sections only via stored_config baseline; preserve-on-save no longer blocks unrelated edits on flag-false adapters. Runtime build_backend guard unchanged (fail-closed preserved). I2: two-layer reserved-credential guard — _check_env_var_collision rejects secret_ref.secret_key collisions at mutation layer; _resolve_user_env pre-check rejects at runtime before DB lookup, closing the pre-existing-row exploit path. Long-term namespace split deferred to associate-secret-binder-identity-sandbox-config-pe. I4: DaytonaSandboxBackend.execute now installs packages in the ephemeral branch before code_run, honoring the advertised dependencies_language='PYTHON' on every execution path. I5/I6/I7: drop vestigial credential_specs from ModalAdapter and DenoAdapter (Modal reads env directly per D6; Deno is local-only); refresh AdapterMetadata docstring to describe constructor-time user_env injection; extend _PHOENIX_SANDBOX_FALLBACK_CREDENTIAL_KEYS with MODAL_TOKEN_ID/SECRET to preserve reserved-name coverage. I8: rewrite getBackendDescription on actual registry keys (DAYTONA_PYTHON, VERCEL_PYTHON, VERCEL_TYPESCRIPT, MODAL); align ENV_PHOENIX_SANDBOX_PROVIDER docstring. Note Vercel internet_access_capability as scaffolding pending runtime wiring. * fix(sandbox): align test expectations with no-per-call-env design - drop test_execute_raises_unsupported_operation_when_env_passed: execute() signature takes no env kwarg; env is passed at build_backend() time per types.SandboxBackend.execute docstring - fake E2BAdapter now raises ValueError (caught by sandboxBackends resolver) instead of NotImplementedError, so the backend-availability probe downgrades to UNAVAILABLE rather than bubbling an unhandled 500 * fix(sandbox): PR #12655 simplification — correctness fixes + test consolidation Phase 1 (correctness): replace hand-rolled equality helpers with pydantic-canonicalized `_normalize_section` (via `model_dump`); future field additions to `InternetAccessConfig` / `PythonDependenciesConfig` now track automatically. `_env_vars_equal` switches from `frozenset` to `Counter` so duplicate multiplicity is preserved. Extend runtime `_resolve_user_env` to reject reserved `EnvVarLiteral.name` symmetrically with the existing `EnvVarSecretRef.secret_key` check, closing the pre-mutation-guard-row exposure D7 was meant to close. Phase 2 (test consolidation): drop three integration-layer duplicate files (`test_env_var_end_to_end.py`, `test_credential_resolution_integration.py`, `test_reserved_credential_names.py`) after folding unique scenarios into their unit-level counterparts; prune pydantic-framework-rechecking tests (`test_unknown_keys_preserved`, `test_empty_config_returns_defaults`, trivial round-trips) from `test_sandbox_config_validation.py`; drop `backend._user_env` internal-attribute assertions from `test_user_env_forwarding.py`; replace `_BACKEND_CACHE` dict-shape peeks in `test_sandbox_cache_invalidation.py` with identity-based observable checks; parametrize per-adapter shape assertions in `test_capability_advertisement.py` over `SANDBOX_ADAPTER_METADATA.keys()`; add module-level docstrings to `test_daytona_backend.py` / `test_vercel_backend.py` / `test_modal_backend.py` stating the per-adapter-files-only-for-unique-lifecycle principle; move `test_sandbox_config_validation.py` from `tests/unit/server/api/` to `tests/unit/server/sandbox/` (tests sandbox pydantic models, not API). Phase 3 (clarity): rename `_PHOENIX_SANDBOX_FALLBACK_CREDENTIAL_KEYS` → `_PHOENIX_RESERVED_CREDENTIAL_ONLY_KEYS` with an inline docstring distinguishing it from `SandboxAdapter.credential_specs` and `RESERVED_CREDENTIAL_NAMES`. Net: +599 / -718. All 429 sandbox-relevant tests pass. * test(sandbox): re-scope test surface to durable interfaces — −2,013 lines PR #12655 Phase 2 re-scope. Prior Phase 2 (commit 68d130c) netted only 124 lines of reduction vs a target of 1,100–1,900 because "fold unique scenarios" was interpreted too conservatively. This pass recalibrates on a sharper thesis: keep the **mutation boundary** (gql_client tests) and the **cross-adapter conformance** parametrized over `SANDBOX_ADAPTER_METADATA.keys()`. Delete per-adapter SDK mocks, internal-factory scenario walks, and pydantic framework round-trips — they re-verify library behavior or couple to implementation details a reasonable refactor would change. Phase 1 — delete per-adapter SDK test files (−1,181 lines): - tests/unit/server/sandbox/test_daytona_backend.py (111 → 0) - tests/unit/server/sandbox/test_vercel_backend.py (425 → 0) - tests/unit/server/sandbox/test_modal_backend.py (246 → 0) - tests/unit/server/sandbox/test_user_env_forwarding.py (399 → 0) Cross-adapter conformance (test_unified_config_contract.py + test_capability_advertisement.py) already parametrizes over SANDBOX_ADAPTER_METADATA.keys(); per-adapter SDK forwarding mocks add no observable contract coverage. Phase 2 — shrink test_env_var_resolution.py (598 → 155, −443): Keep the reserved-name defense (TestResolveUserEnvReservedSecretKey), is_reserved_credential_name helper contract (TestReservedCredentialNameHelper), _enforce_unique_env_var_names validator (TestValidateConfigRejectsDuplicateEnvVarNames, collapsed), and one _resolve_user_env shape happy path. Delete TestEndToEndResolution (5 tests absorbed from prior integration fold — scenario walks; covered by mutation-layer cache-invalidation suite) and all per-kind/per-scenario resolution walks. Phase 3 — shrink test_credential_resolution.py (286 → 66, −220): Keep two direct _resolve_sandbox_credentials invariants — DB wins over env, and missing-key is omitted (not None). Delete TestCredentialResolutionIntegration (3 tests absorbed from prior integration fold — factory-chain walks covered by mutation-layer cache-invalidation tests) and the `_CapturingAdapter` scaffolding used only by them. Phase 4 — shrink test_sandbox_config_validation.py (495 → 353, −142): Keep Phase 1 equality-helper classes byte-identical (TestEnvVarsEqual, TestInternetAccessEqual, TestPackagesEqual), TestValidateConfigErrorPath (ValueError/ValidationError wrapping), TestEnvVarEntryDiscriminatedUnion (extra="forbid" + secret_ref shape), TestConfigFieldSpecsFromModel (reduced to one representative adapter plus the distinct schema-branch cases), and one forbid-extra assertion per nested leaf model. Delete the seven per-adapter pydantic-schema round-trip classes — pydantic schema re-verification is framework behavior, not authored invariant. Phase 5 — light prune mutation-layer sandbox tests (−30 lines, conservative per plan D4): - test_sandbox_config_mutations.py (509 → 479): drop test_reserved_modal_token_id_rejected. The MODAL_TOKEN_ID reserved-name case is covered parametrically in tests/unit/server/api/mutations/test_reserved_credential_names.py for the env_var[].name position; the secret_ref.secret_key mechanism is already tested via VERCEL_TOKEN in the same class. No observable coverage lost. No other mutation-layer files modified — plan D4 mandates light prune only; these are the durable GQL boundary tests. Housekeeping — test_unified_config_contract.py docstring updated: prior text pointed positive-path coverage at the now-deleted test_user_env_forwarding.py and per-adapter SDK files. New docstring states the contract honestly: structural registry + unsupported-capability rejection are covered here; positive SDK-forwarding mock coverage is intentionally absent. Verification: - Full sandbox + mutation-layer suite: 126 passed, 53 skipped (postgres --db sqlite), 0 failed. - Cross-adapter conformance (test_sandbox_backends_capability_flags_per_adapter, test_env_vars_false_raises_for_non_empty_user_env, etc.) still collects one case per adapter key in SANDBOX_ADAPTER_METADATA.keys() — all 7 adapters (WASM, E2B, DAYTONA_PYTHON, VERCEL_PYTHON, VERCEL_TYPESCRIPT, DENO, MODAL) present. - pytest --collect-only -q delta (target file list): 297 → 179 tests collected (−118 tests). - Line count delta (target file list): 4,498 → 2,485 lines (−2,013 lines, 105 insertions / 2,118 deletions). Note on outcome vs target: plan target was ~2,000–2,100 total; final is 2,485. The gap is in Phase 4 (ended at 353 vs ~200) because the plan also mandates "Kept classes are byte-identical to the originals (no accidental edits to custom-validator logic)" for the equality-helper classes — those alone are ~75 lines. Phase 4's keep-set was fully satisfied; the ~200 target was aspirational. Phase 5's −30 is under the −50 floor but within the spirit of the plan's "conservative — these are the boundary tests we are deliberately keeping." Commands used: - uv run pytest tests/unit/server/sandbox/ \ tests/unit/server/api/mutations/test_sandbox_*.py \ tests/unit/server/api/mutations/test_reserved_credential_names.py \ tests/unit/server/api/test_capability_advertisement.py - uv run pytest --collect-only -q <same paths> * Fix auth guard, improve config dialog styles * Filter sandboxes settings tab when permissions are not satisfied * fix(sandbox): align rebased branch with required EvaluatorInputMappingInput fields PR #12727 on main made literal_mapping / path_mapping required on EvaluatorInputMappingInput to fix GraphQL introspection. After rebase, our handlers that relied on the removed default-empty fallback no longer type-check, and tests that omitted inputMapping failed schema validation. Apply option (b) consistently: raise BadRequest when input_mapping is None across all six evaluator create/update handlers rather than silently defaulting to empty mappings. Update tests to supply explicit inputMapping values. Also fix two surfaced typing issues in sandbox config: dict(JSON) casts and JSON return-type casts. * fix(integration-tests): supply inputMapping to dataset evaluator fixture Commit d725531 made inputMapping a required field on createDatasetLlmEvaluator after the EvaluatorInputMappingInput schema tightened in PR #12727. Unit tests were updated there; the integration fixture was missed, so test_experiment_with_evaluators fails with "input_mapping is required". Pass empty literal/path mappings — the per-evaluator inputMapping used at subscription time is supplied separately in the subscription variables. --------- Co-authored-by: Tony Powell <apowell@arize.com>

cephalization added 5 commits April 15, 2026 11:40

feat(evaluators): support inline code previews

444b7c2

feat(evaluators): add code evaluator IDE workspace

f302f01

feat(evaluators): wire IDE flow into dataset editors

0b0725a

refactor(evaluators): promote IDE editor as default

6d15177

refactor(evaluators): simplify dirty state effect handling

eb1b383

github-project-automation Bot added this to phoenix Apr 15, 2026

github-project-automation Bot moved this to 📘 Todo in phoenix Apr 15, 2026

cephalization changed the title ~~Cephalization/improve code evaluator editor ux~~ feat(sandboxes): Improve code evaluator editor ux Apr 15, 2026

cephalization commented Apr 15, 2026

View reviewed changes

Comment thread app/package.json Outdated

cephalization added 9 commits April 15, 2026 13:57

fix(evaluators): keep sandbox IDs opaque and validate code inputs

b8e38b8

cleanup paddings, add field descriptions

28c1825

Add tests and reduce duplication

073d3e4

fix(evaluators): submit validated sandbox selection

478f5a0

playwright tests

99d4616

fix(python): repair code evaluator CI tests

a1df857

remove unused dep

1758505

fix types

279e4e9

fix: revert unnecessary sandbox backend changes to fix type errors

95c79e1

Align wasm_backend.py, e2b_backend.py, and test_wasm_backend.py with version-sandboxes branch. The changes on this branch introduced type errors by using wrong type ignore codes and adding unnecessary helper functions.

cephalization marked this pull request as ready for review April 17, 2026 19:12

cephalization requested review from a team as code owners April 17, 2026 19:12

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

anticorrelator approved these changes Apr 17, 2026

View reviewed changes

github-project-automation Bot moved this from 📘 Todo to 👍 Approved in phoenix Apr 17, 2026

Style improvements

a5ce55e

cephalization added 2 commits April 20, 2026 13:48

address review comments

917c1da

Add additional playwright tests

68ba783

cephalization merged commit 23e4726 into version-sandboxes Apr 20, 2026
39 checks passed

cephalization deleted the cephalization/improve-code-evaluator-editor-ux branch April 20, 2026 19:00

github-project-automation Bot moved this from 👍 Approved to ✅ Done in phoenix Apr 20, 2026

cephalization added a commit that referenced this pull request Apr 22, 2026

feat(sandboxes): Improve code evaluator editor ux (#12690)

a3f17a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sandboxes): Improve code evaluator editor ux#12690

feat(sandboxes): Improve code evaluator editor ux#12690
cephalization merged 17 commits intoversion-sandboxesfrom
cephalization/improve-code-evaluator-editor-ux

cephalization commented Apr 15, 2026

Uh oh!

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

anticorrelator left a comment

Uh oh!

anticorrelator Apr 17, 2026

Uh oh!

anticorrelator Apr 17, 2026

Uh oh!

anticorrelator Apr 17, 2026

Uh oh!

anticorrelator Apr 17, 2026

Uh oh!

anticorrelator Apr 17, 2026

Uh oh!

anticorrelator Apr 17, 2026

Uh oh!

claude Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	output_configs=output_configs,
	input_mapping=input.input_mapping.to_orm()
	if input.input_mapping is not None
	else InputMapping(literal_mapping={}, path_mapping={}),

		_TYPESCRIPT_ARROW_SIGNATURE_RE = re.compile(r"(?:const\|let\|var)\s+evaluate\s=\s\(([^)])\)\s=>")


		def _extract_typescript_object_parameter_keys(params: str) -> tuple[list[str], list[str]]:

Conversation

cephalization commented Apr 15, 2026

Uh oh!

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

anticorrelator left a comment

Choose a reason for hiding this comment

Error Handling & User Feedback

Schema & Validation Consistency

UI Regression & Accessibility

Test Coverage

Additional comments

Uh oh!

anticorrelator Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

anticorrelator Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

anticorrelator Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

anticorrelator Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

anticorrelator Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

anticorrelator Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Apr 20, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants