Skip to content

Integrate Microsoft Work IQ MCP servers via UI-tier MSAL auth handoff #441

@rockfordlhotka

Description

@rockfordlhotka

Implements the design at design/workiq-integration.md.

Why

Microsoft's Work IQ (preview, May 2026) exposes M365 data (Mail, Calendar, Teams, SharePoint, OneDrive, Word, Copilot Search, User, Dataverse) as HTTP MCP servers grounded in the Copilot intelligence layer. The differentiator vs raw Graph is semantic ranking and Copilot-grounded search.

Three constraints make integration awkward:

  1. Per-user delegated auth only — every call requires a token for a specific user holding an M365 Copilot license. No app-only / client-credentials.
  2. Interactive OAuth assumed — standard clients use OAuth 2.1 + PKCE with a loopback redirect. The agent pod has no browser.
  3. Expiring bearer over HTTP — tokens expire in ~1h; the current bridge sets headers once on HttpClient.DefaultRequestHeaders and would silently 401 after the first hour.

Design splits responsibility: UI tier (Blazor/CLI) runs MSAL interactive/device-code, ships the serialized token cache to the agent over the bus; agent persists to its own PVC and does AcquireTokenSilent per request via a DelegatingHandler. UI tier stays credential-free (no k8s Secret RBAC, no PVC mount).

Prerequisite decision

Is Copilot licensing worth it for what Work IQ adds over Graph? If the agent's M365 needs are primarily CRUD, the existing ms-365 MCP server hits Graph directly and avoids the per-user Copilot license. This should be answered before starting Phase 1.

Phased plan

Phase 1 — Bridge auth infrastructure

Can land and be unit-tested without any Work IQ involvement; reusable for any future bearer-auth MCP server.

  • McpServerAuthConfig record; add optional Auth field to McpBridgeServerConfig
  • ITokenProvider + ITokenProviderRegistry interfaces in RockBot.Tools.Mcp
  • BearerInjectionHandler : DelegatingHandler — calls provider per request, retries once on 401 with forceRefresh: true
  • McpBridgeService.ConnectServerAsync builds the auth-bearing HttpClient via the handler when config.Auth is set; otherwise unchanged
  • Tests: handler injects bearer, refreshes on 401, surfaces failures past the second 401; static Headers and Auth coexist on the same server

Phase 2 — MSAL token plumbing (no Work IQ yet)

  • WorkIqAuthCacheUpdated and WorkIqAuthExpired message types under RockBot.Messaging.Abstractions (or wherever auth messages belong)
  • TokenCacheStore in the agent — subscribes to auth.workiq.cache, persists to /data/agent/secrets/workiq-cache.bin (mode 0600), loads into MSAL on startup
  • MsalTokenProvider implementing ITokenProvider; registered in DI with tenant/client/scopes from configmap
  • Init container creates /data/agent/secrets/ with correct ownership/permissions
  • Configmap surfaces WorkIQ__TenantId, WorkIQ__ClientId; values land in appsettings-style config
  • Tests: cache round-trips through serialize/deserialize; silent refresh rotates the on-disk file; MsalUiRequiredException triggers WorkIqAuthExpired publish

Phase 3 — UI tier MSAL flow

  • Blazor: "Connect M365" button → MSAL.NET interactive flow against http://localhost:8080/callback (or whatever fits the Blazor host model)
  • CLI: rockbot auth workiq command → MSAL device-code flow
  • Both paths serialize the MSAL cache and publish WorkIqAuthCacheUpdated
  • Blazor subscribes to WorkIqAuthExpired; surfaces a "Reconnect M365" banner
  • Tests: device-code flow happy path against a stub MSAL endpoint

Phase 4 — Entra app registration + wire-up

  • Register single public-client RockBot app in Entra (manual step; document in deploy/)
  • Grant delegated WorkIQ-* permissions for the servers we want enabled (Mail and Calendar minimum; expand as needed)
  • Add Work IQ servers to default mcp.json (or document the registration UX) using auth.profile: "workiq"
  • End-to-end smoke: Blazor consent → patrol-style task that calls workiq-mail → returns results

Phase 5 — Re-consent UX and failure-mode polish

  • ToolError from Work IQ servers with MsalUiRequiredException upstream carries an actionable message ("M365 connection expired — open Blazor and click Reconnect")
  • Decide behavior for scheduled tasks (patrols) when re-consent is pending — fail-fast vs skip-task-class
  • Documentation: deploy/ runbook for the Entra registration and license requirements

Out of scope (v1)

  • Multi-user Work IQ access via UserProxy plumbing — v2.
  • Token-broker microservice — not needed while everything that calls Work IQ runs in the agent process.
  • App-only / client-credentials path — re-check at each Work IQ preview milestone.

Open questions

  • Whether to ship Work IQ servers in the default seeded mcp.json or require the operator to register them manually after consent (probably the latter — keeps the seed list working without a license).
  • Whether the Blazor MSAL redirect URI should be the loopback http://localhost:8080/callback or a route in the Blazor app itself (depends on how Blazor is hosted in practice).
  • Cache file format: raw MSAL bytes vs an envelope that also tracks AccountId, expiry, scopes (for diagnostics without parsing MSAL internals).

Success criteria

  • Agent successfully calls at least workiq-mail and workiq-calendar tools end-to-end after a Blazor consent
  • Tokens silently refresh through at least one access-token expiry cycle (~1h) without intervention
  • No credentials appear in mcp.json at any point
  • UI containers gain no new RBAC or PVC permissions
  • Re-consent ceremony works after manually revoking the refresh token

References

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions