Skip to content

fix: sanitize OneCLI agent identifiers#2179

Closed
CopyPasteFail wants to merge 1 commit intoqwibitai:mainfrom
CopyPasteFail:fix/onecli-agent-identifier-2046
Closed

fix: sanitize OneCLI agent identifiers#2179
CopyPasteFail wants to merge 1 commit intoqwibitai:mainfrom
CopyPasteFail:fix/onecli-agent-identifier-2046

Conversation

@CopyPasteFail
Copy link
Copy Markdown

Fixes #2046.

NanoClaw agent group IDs can contain underscores, for example ag_f249a3521081, while OneCLI agent identifiers only accept lowercase letters, numbers, and hyphens.

This normalizes the identifier before calling OneCLI by replacing underscores with hyphens. The same normalized identifier is used for ensureAgent and applyContainerConfig.

Because OneCLI approval requests carry the normalized identifier back as the agent external ID, this resolves that identifier back to the real agent_groups.id before scoped approver selection. If multiple database IDs normalize to the same OneCLI identifier, the resolver treats it as ambiguous, logs a warning, and does not select a scoped group.

Possible follow-up: rename getAgentGroupByOneCliIdentifier to resolveAgentGroupByOneCliIdentifier, since the function can return no result when the identifier is missing or ambiguous.

Validation:

  • pnpm vitest run src/onecli-agent-identifier.test.ts
  • pnpm typecheck
  • pnpm test

@CopyPasteFail CopyPasteFail force-pushed the fix/onecli-agent-identifier-2046 branch from 25fafa7 to 057ceba Compare May 1, 2026 21:23
@CopyPasteFail
Copy link
Copy Markdown
Author

Implementation notes

This PR fixes the OneCLI agent identifier mismatch without changing the database identity of agent groups.

Problem

NanoClaw agent group IDs can contain underscores, for example:

ag_f249a3521081

OneCLI agent identifiers only accept lowercase letters, numbers, and hyphens. Passing the raw NanoClaw agent group ID into OneCLI can therefore fail validation.

The obvious fix would be to replace underscores with hyphens before calling OneCLI:

ag_f249a3521081 -> ag-f249a3521081

However, that creates a second issue: OneCLI approval requests later return the normalized identifier as the agent external ID. If approval handling treats that normalized value as the original database agent_groups.id, scoped approval routing can break.

Fix approach

This PR adds a small helper module for OneCLI agent identifiers:

toOneCliAgentIdentifier(agentGroupId)

This converts NanoClaw agent group IDs into OneCLI-safe identifiers by replacing _ with -.

container-runner.ts now uses this normalized identifier when calling OneCLI:

  • ensureAgent(...)
  • applyContainerConfig(...)

Approval handling also resolves the normalized OneCLI identifier back to the real database agent group before selecting approvers:

getAgentGroupByOneCliIdentifier(request.agent.externalId)

That preserves the existing scoped approver behavior:

  1. group-scoped admin
  2. global admin
  3. owner

Edge case considered

There is a theoretical ambiguity:

ag_foo -> ag-foo
ag-foo -> ag-foo

If both database IDs existed, they would normalize to the same OneCLI identifier.

The resolver handles this conservatively:

  • exactly one match: use that agent group
  • zero matches: do not select a scoped group
  • multiple matches: treat as ambiguous and do not select a scoped group

This avoids silently routing an approval to the wrong group-scoped admin.

When the normalized OneCLI identifier cannot be resolved uniquely, the code logs a warning and falls back to the existing non-scoped approver path instead of guessing.

Risk considerations

The main behavior change is that OneCLI approval routing now depends on normalized identifier resolution.

Expected behavior:

  • Normal case: the normalized identifier maps back to exactly one DB agent group, and scoped approvals continue to work.
  • Missing/ambiguous case: scoped routing is skipped, a warning is logged, and approval selection falls back to broader approvers.

This is safer than choosing the wrong scoped group, but it does mean that in an ambiguous case a group-scoped admin may not receive the approval. That seems like the right failure mode for this edge case.

The resolver currently scans all agent groups. This should be acceptable because it runs during OneCLI approval handling, not in a hot path. If this ever becomes a concern, it could be replaced with a persisted normalized identifier or an indexed lookup.

Possible follow-up: rename getAgentGroupByOneCliIdentifier to resolveAgentGroupByOneCliIdentifier, since the function may return no result when the identifier is missing or ambiguous.

Testing

Added focused tests for:

  • converting underscore IDs to OneCLI-safe hyphen IDs
  • converting all underscores, not just the first one
  • ensuring the normalized identifier matches OneCLI’s expected format
  • resolving a normalized OneCLI identifier back to the original DB agent group ID
  • refusing to resolve ambiguous normalized identifiers

Validation run locally:

pnpm vitest run src/onecli-agent-identifier.test.ts
pnpm typecheck
pnpm test

Results:

src/onecli-agent-identifier.test.ts: 4 passed
pnpm typecheck: passed
pnpm test: 26 test files passed, 242 tests passed

@CopyPasteFail CopyPasteFail force-pushed the fix/onecli-agent-identifier-2046 branch 2 times, most recently from 5da033c to 655529d Compare May 1, 2026 21:38
@CopyPasteFail
Copy link
Copy Markdown
Author

Updated after review:

  • Renamed getAgentGroupByOneCliIdentifier to resolveAgentGroupByOneCliIdentifier, since the resolver may return no result when the identifier is missing or ambiguous.
  • Added validation in toOneCliAgentIdentifier() so invalid normalized identifiers fail at the conversion boundary instead of surfacing later as a less obvious OneCLI API error.
  • Added test coverage for invalid identifiers such as uppercase chars, slash chars, and the empty string.

Validation after update:

pnpm vitest run src/onecli-agent-identifier.test.ts
→ 5 tests passed

pnpm typecheck
→ passed

pnpm test
→ 26 test files passed, 243 tests passed

On the migration concern: this PR intentionally does not add migration logic for existing OneCLI registrations. The reported failing IDs contain underscores, which OneCLI rejects, so those affected IDs likely could not have been successfully registered under the old identifier in the first place. If there are older valid identifiers that normalize differently, OneCLI may see a new identifier, but that seems outside the narrow fix for #2046 and would need confirmation of OneCLI’s ensureAgent behavior.

@CopyPasteFail CopyPasteFail force-pushed the fix/onecli-agent-identifier-2046 branch from 655529d to 4782899 Compare May 1, 2026 21:49
@CopyPasteFail
Copy link
Copy Markdown
Author

CopyPasteFail commented May 1, 2026

Follow-up on review comments:

I added the cheap missing coverage for already-valid OneCLI identifiers, since current generated IDs can already be hyphenated and should pass through unchanged.

A few suggestions from review are intentionally left as future considerations rather than part of this bugfix:

  • Caching / persisted normalized IDs: the resolver currently scans agent groups during OneCLI approval handling. That is acceptable for this path and avoids cache invalidation/schema migration complexity. If approval volume or agent group count grows, a normalized identifier column or indexed lookup may make sense.
  • Full approval-flow integration test: useful, but heavier than this targeted fix. The current tests cover normalization, validation, unique DB resolution, ambiguity handling, and passthrough behavior.
  • Warning-log assertion: not added to avoid brittle log-focused tests. The warning exists in the production path for unresolved or ambiguous identifiers.
  • Existing OneCLI registrations: no migration is included. The reported failing underscore IDs were rejected by OneCLI, so they likely were not successfully registered under the old raw identifier. Already-valid hyphenated IDs are unchanged.
  • Additional OneCLI validation rules: the local validator currently mirrors the known OneCLI character rule: lowercase letters, numbers, and hyphens. I did not add leading/trailing hyphen restrictions or max-length checks because I don’t have a confirmed OneCLI rule for those constraints. If OneCLI documents stricter validation, we can mirror it here.

Current validation after the latest update:

pnpm vitest run src/onecli-agent-identifier.test.ts
→ 6 tests passed

pnpm typecheck
→ passed

pnpm test
→ 26 test files passed, 244 tests passed

@gavrielc
Copy link
Copy Markdown
Collaborator

gavrielc commented May 2, 2026

I traced every code path that creates an agent_groups row at the time #2046 was filed (commit f8c3d02, April 26). All four generators produce hyphens, and git log --all -G "'ag_'" is empty on every branch. Nothing in this repo has ever generated ag_xxx.

@dr-pabs, did you actually hit this, and if so where did the underscore id come from in your DB? Without a reproducible source we'd rather not add permanent translation logic at the OneCLI boundary for a state the codebase doesn't produce.

@CopyPasteFail
Copy link
Copy Markdown
Author

Small clarification on reproduction:

I did not run a full end-to-end NanoClaw + OneCLI container-spawn reproduction that produced the live 400 response.

What I verified locally is the code path that would cause it:

  1. The issue reports that OneCLI rejects agent identifiers containing underscores.
  2. container-runner.ts was passing the raw agentGroup.id into OneCLI as the agent identifier.
  3. The reported failing ID shape is something like ag_f249a3521081, which contains _.
  4. This PR changes that boundary so NanoClaw sends ag-f249a3521081 to OneCLI instead.
  5. The tests cover the conversion, validation, reverse lookup back to the DB agent group, ambiguity handling, and already-valid hyphenated IDs.

So this PR is based on code-path verification plus focused tests, not a full live end-to-end reproduction. I think that is enough for this fix, but I want to be explicit about what was and was not reproduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OneCLI agent identifier rejected (400 error)

2 participants