TOOLS-2592 Tooling for shipping Triton service images from monitor-reef#21
Merged
TOOLS-2592 Tooling for shipping Triton service images from monitor-reef#21
Conversation
Outlines the images/ directory approach for building multiple Triton zone images from a single Rust monorepo, including per-service Makefiles, SAPI integration, and a jenkins-joylib enhancement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document which services don't ship as images (bugview, jira-stub), list the reference repos needed to understand the design, and add a prerequisites checklist for the jenkins-joylib change and SmartOS testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce triton-api, a Dropshot API service that will eventually replace cloudapi. For now it has a single /ping endpoint. This also establishes the images/ directory structure for building zone images from the monorepo. - apis/triton-api: API trait with /ping endpoint - services/triton-api-server: service implementation - images/triton-api: zone image Makefile, SMF manifests, SAPI manifests, and boot script - images/image.defs.mk: shared image build definitions, sets ENGBLD_REPO_ROOT for eng Makefile compatibility - deps/eng: updated to include ENGBLD_REPO_ROOT monorepo support - .gitignore: add image build artifact patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
setup.sh was committed without the execute bit, which would cause SMF postboot to fail to start. Also move smf_include.sh source before the first-boot marker check so $SMF_EXIT_OK is available for the early exit path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
$(shell) swallows exit codes, so git rev-parse and git submodule update failures would leave ENGBLD_REPO_ROOT empty and eng includes broken with confusing errors. Add explicit guards with clear messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update all REPO_ROOT references in code blocks to ENGBLD_REPO_ROOT to match actual implementation. Renumber open questions (was 1,3,4,5 now 1,2,3,4). Reframe eng Makefile compatibility question to reflect that ENGBLD_REPO_ROOT already addresses the root issue. Remove local filesystem path from TODO. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Files were untracked artifacts, not committed to the branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add status and healthy fields to PingResponse matching VMAPI pattern. Move types to types/ module for consistency with other API crates. Add Clone derive and crate-level doc comment. Update server to return populated response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add triton-api dependency and ManagedApiConfig entry so make openapi-generate and openapi-check cover the new API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document that the bind address should come from the SAPI-generated config file once this service is ready for production deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
*.tar.gz, bits/, proto/, make_stamps/ were repo-wide but only needed for image builds. Scope to images/*/ to avoid accidentally hiding legitimate files elsewhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rust successor to sdcadm for Triton datacenter administration. All 16 top-level commands and 47 subcommands scaffolded as stubs returning "not yet implemented". Shell completion works. Design doc covers architecture, API client strategy, and first target (post-setup portal). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Internal Triton APIs get the full trait-based pipeline (API trait → OpenAPI spec → Progenitor client), not hand-written minimal clients. Builds toward correct specs from day one and means the trait is ready when we rewrite the Node.js services. jira-client is the sole exception as a large external API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 5 API clients needed for post-setup portal also unlock services, instances, avail, check-config, and check-health as low-hanging fruit. Reordered priority list to reflect this. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Grafana has a known-working sdcadm implementation to validate against. Same APIs needed, but we can compare results on a real DC before applying the pattern to a brand-new service (portal). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three patches applied to sapi-api.json: - GET /mode: returns plain string, not ModeResponse JSON object - POST /mode: returns 204 no content, not 200 with JSON body - POST /loglevel: returns empty 200, not JSON body Updated client-generator to use patched spec, regenerated client, and fixed CLI to handle the new response types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trait changes (canonical type fixes): - Create endpoints return 200 (matching Node.js Restify default), not 201 - LogLevelResponse.level is serde_json::Value (Bunyan returns integer) - SetLogLevelBody.level is serde_json::Value (accepts string or integer) - Add uuid and master fields to all create body types Patch additions: - GET /ping 500: documented as known limitation (Node.js returns PingResponse on 500, Progenitor can't handle multiple response types) - Create status code safety net patch (no-op since trait already fixed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused ModeResponse type (GET /mode is patched to return string) - Add StorageType enum for PingResponse.stor_type field - Change PingResponse.mode from String to SapiMode enum - Change get_mode trait to return SapiMode (patched to string in spec) - Change set_mode trait to HttpResponseUpdatedNoContent (native 204) - Remove dead UpdateAttributesBody re-export from sapi-client - Simplify post_mode patch to no-op (trait now generates 204 natively) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1: Add sections for enum identification, Restify response pattern cataloging, patch requirements, and hidden request fields. Phase 2: Add guidance on using Phase 1 enums, matching Restify response patterns to Dropshot types (200 not 201 for creates), and avoiding dead wrapper types. Phase 5: Add enum wire-value verification, status code checking, dead schema detection, and remaining String→enum scan. Reference: Add Restify response pattern table, Progenitor limitations section (multiple body types, text/plain, empty bodies). Orchestrator: Add Step 2b for applying OpenAPI spec patches between API generation and client generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add sapi-client and vmapi-client dependencies to tritonadm. Convert main to async with tokio. Implement `services` (alias `svcs`) and `instances` (alias `insts`) as the first real commands, replacing their stubs. Services output matches sdcadm columns (type, uuid, name, image, insts). Instances enriches SAPI data with VM alias, state, and image from VMAPI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four crates (cloudapi-client, vmapi-client, triton-gateway-client, bugview-service) were overriding the workspace's `rand = "0.8"` to `rand = "0.9"` individually, pulling both major versions into the graph. Bump the workspace to 0.9 and drop the per-crate overrides. triton-auth's crypto stack (p256, p384, rsa, ed25519-dalek) is pinned to rand_core 0.6 by their `*::random(&mut _)` APIs — rand 0.9 bundles rand_core 0.9, whose `OsRng` cannot satisfy the 0.6 trait bounds. Rather than split the workspace's rand version, depend directly on `rand_core = "0.6"` for the OsRng value, which stays in the rand_core 0.6 ecosystem the crypto crates require. Migrates the three call sites in triton-auth and one in triton-auth-session. Enable rand's `os_rng` feature in the workspace so future callers that do want rand's own OsRng (the 0.9 one, unrelated to the crypto stack) can still reach it without a per-crate override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add async-trait, base64, bytes, clap_complete, dirs, futures-util,
rcgen, url, and urlencoding to [workspace.dependencies] and migrate
all consumer crates to `{ workspace = true }`. Each of these deps was
declared directly in two or more crates with either matching or
trivially-compatible versions; hoisting them makes version bumps a
single-file edit.
url gains `features = ["serde"]` in the workspace definition,
matching the shape triton-auth-session and triton-api-server were
already using. bugview-service's plain `url = "2.5"` picks up serde
as a no-op (the crate wasn't using it).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three CLI crates carried identical `assert_cmd = "2.0"` and `predicates = "3.0"` dev-deps. Hoist both to the workspace so new CLIs pick them up by name and version drift stays impossible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hoist every external dep that was still declared directly in a single
crate. The bulk is triton-auth's 18-crate crypto stack (ssh-key,
ssh-encoding, md-5, rsa, p384, ed25519-dalek, dsa, sha1, sha2, pkcs8,
sec1, pem-rfc7468, signature, aes, cbc, des, der, time, plus the
rand_core 0.6 pin and serial_test dev-dep), triton-tls's three TLS
helpers (rustls-native-certs, rustls-pemfile, webpki-roots),
triton-cli's TUI/test stack (serde_yaml, comfy-table, dialoguer,
indicatif, getrandom, rpassword, test-case, pretty_assertions,
hostname, regex), and the genuinely one-off deps indexmap
(bugview-service), libc (tritonadm), syn (client-generator).
Also mops up two stray `http = "1"` direct declarations in
cloudapi-client and triton-gateway-client that the D2 sweep missed
(pattern only matched "1.1") and triton-cli's `http = "1.0"`.
After this commit, every workspace member's [dependencies],
[dev-dependencies], and [build-dependencies] declares every external
dep via `{ workspace = true }`. New crates pick up versions and
features for free; future bumps touch one file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumped versions of every external dep where an upgrade was available (per `cargo upgrades`) and the compiler accepts it. Held-back versions are documented in the workspace Cargo.toml comments. Bumped: - build-data 0.2 → 0.3 (set_GIT_COMMIT_SHORT / no_debug_rebuilds now return Result; handled in triton-cli/build.rs) - getrandom 0.3 → 0.4 - jsonwebtoken 9.3 → 10.3 (now requires explicit CryptoProvider; enable `rust_crypto` feature) - ldap3 0.11 → 0.12 (feature rename: "tls-rustls" → "tls-rustls-ring") - progenitor 0.12 → 0.13 - progenitor-client 0.12 → 0.13 - progenitor-impl 0.12 → 0.13 - strum 0.27 → 0.28 - tokio-tungstenite 0.28 → 0.29 Held back: - dropshot (0.16) and dropshot-api-manager (0.3): dropshot 0.17 emits richer WebSocket response schemas (101 / 4XX / 5XX instead of `default`) which triggers a progenitor 0.13 panic (`assertion failed: response_types.len() <= 1` in method.rs). Progenitor 0.13 runs fine against dropshot 0.16's older spec shape, so keep dropshot at 0.16 until both upstreams cut a coordinated release. - schemars (0.8): still pinned by dropshot 0.16. - crypto stack (aes 0.8, cbc 0.1, der 0.7, des 0.8, md-5 0.10, sec1 0.7, sha1 0.10, sha2 0.10): all locked by rsa 0.9 / ssh-key 0.6, which still use digest 0.10 traits. Bumping the stack without bumping rsa/ssh-key produced trait-bound failures (BlockSizeUser, FixedOutput, HashMarker, etc.). - rand (0.9), rand_core (0.6): rand 0.10 / rand_core 0.10 renamed `os_rng` → `sys_rng` on rand and removed OsRng from rand_core without a feature; the crypto stack above also holds rand_core 0.6. make check (excluding openapi-check's stale-fix guidance, which was fixed by regenerating) passes 1344 tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move every `rustls::crypto::ring::default_provider().install_default()` call in the workspace behind `triton_tls::install_default_crypto_provider` so a future backend switch is a single-file edit. Introduce a `selected_crypto_provider()` helper that returns the active `CryptoProvider`, and use it both for the install and for `NoCertVerifier::supported_verify_schemes`. Services that had copy-pasted the install helper (bugview-service main + tests, triton-api-server) now depend on triton-tls and drop their local helpers; triton-gateway's inline install collapses into one line too. Drop the now-unused direct `rustls` deps from triton-api-server and bugview-service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `tritonadm post-setup tritonadmin` subcommand symmetric with `portal` and `tritonapi`: TRITONADMIN_CONFIG ServiceConfig + a small build_tritonadmin_metadata helper that seeds TRITON_ADMIN_JWT_SECRET. The generic cmd_add_service path handles image fetch / SAPI service creation / instance provisioning unchanged. Default image source is "current" (local IMGAPI) since triton-admin builds aren't on the updates server yet. Also flips PORTAL_CONFIG.delegate_dataset from false to true. The mariana-trench user-portal image now generates a self-signed haproxy cert at /data/tls on first boot and expects the dataset to persist across reprovision; matches the rationale already in TRITONAPI_CONFIG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flip PORTAL_CONFIG and TRITONADMIN_CONFIG to include_external_primary: false, matching the adminui/imgapi/triton-api convention. The zone now provisions on the admin network only; the operator must run `post-setup common-external-nics` to attach the external NIC. Avoids a foot-gun where running `tritonadm post-setup portal` (or tritonadmin) on a real cluster would put a freshly-provisioned web UI on the public network in a single step, before an operator has had a chance to inspect it. The zone's haproxy still terminates TLS, but testing/staging deployments shouldn't auto-expose anyway. Also extend cmd_common_external_nics's svc_names to include the two new services and update the surrounding doc comments and final "nothing to do" message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without the SAPI v2 accept-version header, services responses omit the `type` field and exclude agent-typed rows entirely, so `tritonadm services` rendered an empty TYPE column and silently dropped every agent service. Match sdcadm by setting the header on the SAPI client. Adds `triton_tls::build_http_client_with_headers` and a new `sapi_client::build_client` helper that wires the header in once; tritonadm's call sites switch to the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in main's "newtype schema naming + HEAD endpoint fixes" (a0c9430) and the Go CloudAPI client subtree (5c9940b). Reconciles several auto-merged conflicts and refactors triton-gateway-client to mirror cloudapi-client's improved shape (newtype schemas + action body parameters). Conflict resolutions: - .gitignore / Makefile: combined our image+tritonadm targets with main's Go-toolchain and coverage rules. - Cargo.lock: regenerated. - cli/triton-cli/src/main.rs: kept HEAD's triton_tls::build_http_client. - cli/triton-cli/src/commands/instance/{create,list,migration}.rs + rbac/{role,role_tags}.rs: kept the gateway-client direction. - cloudapi-client/src/generated.rs and openapi-manager/src/transforms.rs: regenerated post-merge; dropped dead patch_cloudapi_error_schema in favour of HEAD's patch_node_triton_error_schema. Refactor to align gateway-client with cloudapi-client's improvements: - client-generator: added with_replacement patches for triton-gateway (Tags, MetadataObject->Metadata, RoleTags, ProvisioningLimits, Resolvers, PolicyRules, ImageAcl, AffinityRules, NetworkIds) plus a VmBrand value_enum patch. - triton-gateway-client/src/lib.rs: split re-exports — action body types via `pub use types::*`, everything else via `pub use cloudapi_api::*`. Surface AffinityRules / ImageAcl / NetworkIds / Resolvers / RoleTags / PolicyRef / PolicyRules / ProvisioningLimit. - TypedClient action methods (start/stop/reboot/resize/rename_machine, enable/disable_firewall, enable/disable_deletion_protection, export_image) take `&Request` body types instead of `Option<String> origin`. - ListMachinesFilter.brand: Brand -> VmBrand to match the API's list_machines builder. - Added chrono dep; fake_response_body's fake_ts is DateTime<Utc>. CLI: global cloudapi_client:: -> triton_gateway_client:: rename, Brand2 -> Brand, list.rs imports VmBrand for state comparisons, migration.rs wraps affinity in AffinityRules::from, snapshot.rs test client wraps AuthConfig in GatewayAuthConfig::ssh_key. Plus: bump libs/triton-auth/src/signature.rs copyright to 2026. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The custom bits-upload recipe was passing `-d $(ENGBLD_DEST_OUT_PATH)` where eng's standard target uses `-d $(ENGBLD_DEST_OUT_PATH)/$(NAME)`, so tritonadm builds were landing in /public/builds/<timestamp>/ rather than /public/builds/tritonadm/<timestamp>/. Match the convention so the artifacts are grouped with the other tritonadm builds and the tritonadm-latest symlink lands where consumers expect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Was commented out while the pipeline was being validated. make check now passes (rust check + tests + clippy + openapi-check + clients-check + go-vet + go-test), so gate builds on it before image upload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wait_for_snapshot_deleted tests build a TypedClient (and thus a reqwest/rustls client) but ran without a process-global CryptoProvider. Production binaries install one in main(); under nextest each test runs in its own process, so the install must happen in the test helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several tritonadm code paths need to distinguish "the resource does not exist" (404) from "the API call itself failed" (5xx, transport, auth). The latter must not be silently downgraded to a default value, which is how an admin tool ends up reporting "Done." while production state is half-wired. Add a small `commands::errors` helper plus the `progenitor-client` dep so the next few commits can apply the distinction at each suspect call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously `cmd_dc_maint_status` matched `Err(_) => false` on the SAPI service lookups for cloudapi and docker, so a transient SAPI outage, auth failure, or 5xx caused `tritonadm dc-maint status` to confidently report "DC maintenance: off" even when maintenance was actually on and we just couldn't read the state. An operator running this command to confirm whether traffic is being shed could reach the wrong conclusion. Propagate SAPI errors with `.context()` instead. The `metadata` sub-lookup still falls back to `false` because an empty/missing field genuinely means "not in maintenance"; only the API call itself failing is a real failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both `cmd_avail` and `cmd_instances` previously matched `Err(_)` on their per-service / per-instance API lookups, silently dropping rows or substituting "-". For an operator-facing inventory tool this is exactly backwards: a partial outage masquerades as a fleet that's healthy and up-to-date, or a fleet that's entirely unknown. cmd_avail now distinguishes 404 (image truly absent in IMGAPI — keep the silent skip) from any other error (collected and printed as a warning summary so the operator knows the table is incomplete and why). A non-UUID `image_uuid` in SAPI is also treated as a real signal rather than a routine miss, since it indicates data corruption. cmd_instances now shows "missing" on a 404 (the SAPI instance points at a VM VMAPI doesn't know about — legitimate stale state) and "?ERR" on any other VMAPI error, with a per-instance error summary printed after the table. A wholesale VMAPI outage is now distinguishable from a fleet that genuinely contains many unknown instances. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three call sites in this module previously collapsed every IMGAPI error into "the resource is absent, retry": * `ensure_origin_imported` matched `Err(_)` on the local manifest lookup, so a 5xx from local IMGAPI would launch an import action against a sick IMGAPI and produce a confusing chained error instead of pointing at the real problem. * `import_remote_with_channel_fallback` retried channel-less on any error from the channel-scoped call. Network blips, auth, and TLS failures all got misread as "origin not on this channel," and the user-facing remediation hint suggested a workaround that wouldn't fix the actual cause. * `wait_for_image_active` swallowed every `Err` for 4 minutes and then bailed with "timed out", masking a broken local IMGAPI as a slow import. All three now match on 404 explicitly and propagate other errors with `.context()`. `wait_for_image_active` tolerates up to 3 consecutive non-404 errors before bailing so a single transient blip during a long import doesn't fail the whole operation. The `Err(e) if action_is_404(&e)` arm in `import_remote_with_channel_fallback` deliberately falls through to the default-channel retry below, so it's tagged `arch-lint: allow(no-error-swallowing)` with a reason; non-404s are propagated by the next match arm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four hazards in `cmd_add_service` and its helpers previously turned real production-state failures into operator-perceived success: * `existing_vm` lookup matched `Err(_) => None`, so a transient VMAPI failure made the reprovision-decision logic conclude "instance is up-to-date, nothing to do" without ever reading the actual VM state. Now 404 still maps to None (legitimate stale SAPI -> missing-VM state), but other errors propagate. * `ensure_manta_nic` failures at both the post-create and post-update call sites were downgraded to a `Warning:` log and the command exited 0 with "Done." printed. The service was created/updated but not fully wired, and the operator had no signal to investigate. Both call sites now propagate via `.with_context()`. * `find_image` used `is_err()` to decide "needs download" for the `latest` flow, collapsing 404 (correct download trigger) and 503 (local IMGAPI is down — should bail, not start an import). The explicit-UUID flow had the same `Err(_) => try updates server` pattern. Both now match on 404 explicitly and propagate other errors with `.context()`. * `wait_for_image_active` (a copy of the helper in `imgapi_util.rs`) printed dots for 4 minutes on any IMGAPI error and then claimed a timeout. Now it matches 404 explicitly, tolerates up to 3 consecutive non-404 errors as transient, and propagates the real error after that — an operator no longer chases imaginary slow imports when local IMGAPI is broken. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds with_patch entries in configure_sapi for ServiceType, UpdateAction, and SapiMode so the Progenitor-generated copies of these enums get clap::ValueEnum, matching the pattern already used by configure_imgapi / configure_papi / configure_napi. Pulls clap into sapi-client's [dependencies] (it was the only generated client without it) so the new derives compile. Mechanical follow-up: regenerated clients/internal/sapi-client/src/ generated.rs via 'make clients-generate'. The next commit replaces hand-rolled parse_service_type / parse_action / set-mode parser in tritonadm with these typed enums and deletes the helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous Subcommand definition declared service_type, instance_type, action, and the set-mode positional as Option<String> / String, with hand-rolled parse_service_type / parse_action helpers and an inline match in SetMode that re-implemented what clap::ValueEnum would generate. This violated CLAUDE.md type-safety rules #2 (ValueEnum on the canonical type) and #4 (no duplicate enum definitions): a typo in the Rust match arms could disagree with the wire format with no compile-time signal, and the parser strings would drift from the API contract any time the OpenAPI spec changed. With clap::ValueEnum now on the Progenitor copies (previous commit), the args declare types::ServiceType / types::UpdateAction / types::SapiMode directly with #[arg(value_enum)]. clap handles parsing and `--help` autopopulates valid values. Deletes parse_service_type, parse_action, and the inline mode match. Body construction loses the .as_deref().map(parse_*).transpose()? boilerplate too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verify-only module behind `verify_totp(secret_base32, code)`. Parameters fixed at SHA-1 / 30s step / 6 digits / +/-1 step skew to match piranha's defaults so existing UFDS enrollments verify unchanged. Tests cover the RFC 6238 SHA-1 vectors, the skew-window edges, and the malformed-secret error path. Enrollment is intentionally left to piranha for v1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `LdapService::read_user_metadata_value` (general capimetadata
reader, base-scoped search at `metadata=<ns>, uuid=<uuid>, <base>`)
and `LdapService::read_totp_secret`, which targets the piranha
schema (`portal` / `usemoresecurity`, JSON `{"secretkey": "..."}`).
The shared namespace lets existing piranha enrollments verify
unchanged. `noSuchObject`, missing `secretkey`, and empty
`secretkey` (the piranha-disable state) all collapse to `Ok(None)`
so callers can treat them uniformly as "not enrolled."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `JwtService::create_challenge_token` / `verify_challenge_token` (with parallel methods on `JwtVerifier`). Challenge tokens reuse the access-token signing key but carry a distinct claim shape: `sub` + `username` only, plus a literal `purpose: "2fa-pending"` and a 5-minute TTL. No `roles` / `is_admin` is carried — those come from mahi only after TOTP succeeds, so a leaked challenge can never elevate. Cross-decoding fails by construction: an access token is missing `purpose` (required by `ChallengeClaims`); a challenge is missing `roles` and `is_admin` (required by `Claims`). Tests cover the round trip, both cross-decoding paths, expired and wrong-purpose challenges, and tokens signed by a different issuer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`POST /v1/auth/login` now returns a tagged `LoginOutcome`: `complete` (the historical `LoginResponse` shape) when the user has no second factor, or `challenge_required` carrying a 5-minute challenge token plus the offered methods (currently `[totp]`) when the user has a TOTP secret in UFDS `metadata=portal, usemoresecurity`. The client posts the challenge token plus a code to the new `POST /v1/auth/login/verify`, which re-reads the secret server-side, runs `verify_totp`, and finishes the session with the same `LoginResponse` + `Set-Cookie` the non-2FA path produces. The challenge token never carries the secret. If 2FA is disabled between login and verify the secret read returns `None` and verify fails closed. SSH-key login (`/v1/auth/login-ssh`) is unchanged — key possession already covers the second-factor role and matches piranha parity. Regenerated `openapi-specs/generated/triton-api.json` and the merged `openapi-specs/patched/triton-gateway-api.json`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regenerates `triton-gateway-client/src/generated.rs` against the new tritonapi spec (the LoginOutcome enum, LoginVerifyRequest, ChallengeMethod, and the auth_login_verify operation), and teaches `triton login --user <name>` to handle the `LoginOutcome::ChallengeRequired` branch by prompting for an authenticator code (or reading `TRITON_TOTP_CODE` for non-tty flows) and exchanging it via `/v1/auth/login/verify` for the `LoginResponse` the rest of the login pipeline expects. If the server offers only second-factor methods this CLI does not recognise (i.e. all entries reduce to `ChallengeMethod::Unknown`), we refuse before prompting rather than collecting a code we cannot use. SSH-key login is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lost-authenticator recovery story belongs near the verify handler -- that's where someone debugging the path will look. Plain `//` rather than `///` keeps it off the OpenAPI spec and out of the generated client docs, since "ssh into a headnode and run sdc-ufds" is an ops concern, not part of the API contract clients consume. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
travispaul
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add tritonadm CLI, SAPI/IMGAPI/NAPI/PAPI API conversions, and zone image build infrastructure
Summary
tritonadmCLI: New operator administration tool with subcommands for post-setup (grafana, portal, common-external-nics), image management (list, import, import-remote, delete),dc-maint status, and dev teardown helpers
images/), and atriton-apiservice with SMF manifests and SAPI metadatatriton-tlscrate: Portable TLS cert loading that works on both illumos and other platformsTest plan
make package-build PACKAGE=tritonadmbuilds successfullymake package-test PACKAGE=sapi-cli/imgapi-cli/napi-cli/papi-clipassmake openapi-checkconfirms generated specs are up-to-datemake clients-checkconfirms generated client code is up-to-datemake auditpasses (with known pre-existing exceptions)