[Feature Request] Support Idle Lifecycle Policy for Sandboxes

## Summary

I would like `agent-sandbox` to support an idle lifecycle policy for long-lived interactive workspaces.

The goal is to let active workspaces remain available while in use, automatically stop idle workspaces while retaining their recoverable state, and eventually delete abandoned retained workspaces.

This is useful for browser IDEs, notebooks, and agent workspaces where rebuilding the environment is expensive, but keeping pods running indefinitely is wasteful.

For this proposal, `Suspend` and `Retain` are effectively the same user-facing outcome for this use case: stop active compute, keep enough state to resume the workspace later, and avoid deleting the workspace immediately.

## Desired Lifecycle

A workspace should be able to follow this lifecycle:

1. A `Sandbox` is created in `operationMode: Running` with an active TTL, for example 24 hours.
2. If the active TTL expires, the `Sandbox` transitions to a retained non-running state. In `v1beta1`, this may be represented as `operationMode: Suspended`, but the important behavior is retention.
3. When retained, a new retention TTL is stamped, for example 30 days, with expiration action `Delete`.
4. If the user resumes the workspace before the retention TTL expires, the `Sandbox` returns to `operationMode: Running`.
5. On resume, the active TTL is reset to 24 hours and the active expiration action becomes retain/suspend again.
6. If the workspace is never resumed and the retention TTL expires, the `Sandbox` is deleted.
7. Ideally, active TTL renewal should happen when a client connection is created or renewed, such as a browser IDE WebSocket connection.

```text
Created
  -> operationMode: Running
  -> active TTL: 24h
  -> active expiration action: Retain/Suspend

Active TTL expires
  -> retained non-running state
  -> operationMode: Suspended, if that is the canonical v1beta1 representation
  -> retention TTL: 30d
  -> retained expiration action: Delete

User resumes before 30d
  -> operationMode: Running
  -> active TTL reset to 24h
  -> active expiration action: Retain/Suspend

No resume before 30d
  -> Deleted
```

## API Scope and Ownership

This lifecycle policy should be supported directly on the core `Sandbox` API and enforced by the core `Sandbox` controller.

`SandboxClaim` should be able to accept the same lifecycle policy for template-driven workflows, but it should mirror/pass that policy through to the generated `Sandbox` rather than independently enforcing a separate lifecycle state machine.

Desired ownership model:

- `Sandbox.spec.lifecycle` defines the actual runtime lifecycle policy.
- The `Sandbox` controller enforces active TTL, retained non-running transitions, resume behavior, retention TTL, and deletion.
- `SandboxClaim.spec.lifecycle` may expose the same fields for convenience.
- The `SandboxClaim` controller passes those lifecycle fields through when creating or reconciling the backing `Sandbox`.
- Direct `Sandbox` users get the same lifecycle behavior without needing `SandboxClaim`.

## Why Existing Lifecycle Support Is Not Enough

Current lifecycle support appears centered on absolute `shutdownTime` and `shutdownPolicy`. `Retain` is close to the desired first-stage idle behavior, and `operationMode: Suspended` may be the right v1beta1 representation for the non-running retained state. However, the lifecycle policy cannot yet express:

- Active TTL expiration should retain the workspace state and stop active compute rather than delete immediately.
- Retained non-running resources should have a separate retention TTL.
- Resume should reset the active TTL.
- Connection/activity renewal should extend the active TTL.
- The controller should own the state transition between running, suspended, resumed, and deleted.

## Possible API Shape

This is one possible shape, not a fixed proposal:

```yaml
apiVersion: agents.x-k8s.io/v1beta1
kind: Sandbox
spec:
  operationMode: Running
  lifecycle:
    activeTTLSeconds: 86400
    activeExpirationPolicy: Retain
    retainedTTLSeconds: 2592000
    retainedExpirationPolicy: Delete
```

For `SandboxClaim`, the same policy could be accepted and mirrored to the generated `Sandbox`:

```yaml
apiVersion: extensions.agents.x-k8s.io/v1beta1
kind: SandboxClaim
spec:
  lifecycle:
    activeTTLSeconds: 86400
    activeExpirationPolicy: Retain
    retainedTTLSeconds: 2592000
    retainedExpirationPolicy: Delete
```

The generated `Sandbox` would receive the lifecycle policy and the `Sandbox` controller would enforce it.

## Activity Renewal

For browser workspaces, activity could come from the gateway/router/client layer.

A Kubernetes-native option might be a `Lease` associated with the `Sandbox`. The router or client could renew the Lease periodically while a WebSocket/session is active. The controller could use the Lease renewal time as the source of activity without requiring frequent writes to the `Sandbox` object.

This would allow active workspaces to remain running while a user is connected, without causing high write volume on the `Sandbox` object.

## Open Questions

- Should this extend the existing lifecycle fields or introduce a new lifecycle policy struct?
- Should this build on existing `Retain` semantics, or introduce separate active/retained expiration policies?
- If `operationMode: Suspended` is the representation for the retained non-running state, should the retention TTL start when `operationMode` is set to `Suspended`, or when the `Suspended=True` condition is observed?
- Should activity be represented by a Lease, annotation, status field, or subresource?
- Should activity renewal be optional, gated by a field such as `renewOnActivity: true`?
- How should this interact with future suspend implementations such as freeze or hibernate?

## Desired Outcome

Users can define a lifecycle policy where active workspaces remain running while in use, idle workspaces automatically stop active compute while retaining recoverable state, and abandoned retained workspaces are eventually deleted without manual cleanup.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support Idle Lifecycle Policy for Sandboxes #849

Summary

Desired Lifecycle

API Scope and Ownership

Why Existing Lifecycle Support Is Not Enough

Possible API Shape

Activity Renewal

Open Questions

Desired Outcome

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature Request] Support Idle Lifecycle Policy for Sandboxes #849

Description

Summary

Desired Lifecycle

API Scope and Ownership

Why Existing Lifecycle Support Is Not Enough

Possible API Shape

Activity Renewal

Open Questions

Desired Outcome

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions