Skip to content

[Feature]: Desktop — capability handlers for macOS UI control (screenshot, click, keys, AX, AppleScript) #6499

@theonlyhennygod

Description

@theonlyhennygod

Summary

Implement the actual macOS operations the agent can dispatch over /ws/nodes once the persistent NodeClient is in place. Each handler reads its required permission via the existing primitives in apps/tauri/src/macos/permissions.rs, executes the operation, and returns a structured result.

Problem

The persistent WebSocket and capability advertisement are useless without handlers that actually do the work. We need first-class implementations for the high-value verbs.

Proposal

  • New module tree apps/tauri/src/capabilities/:
    • screenshot.rs — uses CGDisplayCreateImage / ScreenCaptureKit; requires Screen Recording.
    • click.rs — synthetic mouse click via CGEventCreateMouseEvent; requires Accessibility.
    • type_keys.rs — synthetic keyboard via CGEventCreateKeyboardEvent; requires Accessibility.
    • read_ax.rs — read UI tree via AXUIElement; requires Accessibility.
    • applescript.rsosascript -e <script>; requires Automation per-app.
    • notify.rsUNUserNotificationCenter.add; requires Notifications.
  • Each handler:
    1. Calls macos::permissions::check_* to verify the prerequisite.
    2. If denied, returns a structured permission_denied(name) error so the agent can surface "I lost — re-grant?".
    3. On success returns serializable output (base64 PNG for screenshot, {} for clicks, JSON tree for AX read).

Approval gating (per #6321)

  • Read-only ops (screenshot, read_ax) auto-approve.
  • Risky ops (click, type_keys, applescript) trigger a system dialog the first time per-app and remember per a small allowlist persisted via tauri-plugin-store.

Files

  • apps/tauri/src/capabilities/{mod,screenshot,click,type_keys,read_ax,applescript,notify}.rs (new)
  • apps/tauri/src/lib.rs — register handlers in the NodeClient dispatch table
  • apps/tauri/Cargo.toml — add core-graphics and core-foundation if needed

Acceptance

  • From a wscat session: dispatching {type:"invoke", capability:"screenshot"} returns a base64 PNG of the primary display.
  • Dispatching {capability:"click", args:{x:100, y:200}} clicks at that coordinate.
  • Revoking screen recording in System Settings → next screenshot dispatch returns permission_denied("screen_recording").
  • Risky ops trigger the user-approval dialog the first time and remember the choice.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    dependenciesAuto scope: dependency manifest/lock/policy changed.desktopDesktop app (Tauri) — menu bar, dashboard parity, macOS integrationsenhancementNew feature or requestpriority:p2Medium priorityrisk: highAuto risk: security/runtime/gateway/tools/workflows.runtimeAuto scope: src/runtime/** changed.securityAuto scope: src/security/** changed.status:in-progressAn open PR is actively targeting this issue.status:no-staleExempt from the 60-day stale auto-close policy.tauriTauri shell, native bindings, build/packaging

    Type

    No type

    Projects

    Status

    Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions