Summary
Building on #694 (Suspend API + pluggable SnapshotProvider), we'd like a SnapshotProvider whose artifact is a portable OCI image pushed to a configurable registry, so a sandbox's state can be restored as a new Sandbox — including on a different runtime or cluster — not only resumed in place.
Use case
We're building an internal developer-sandbox platform (~800 engineers). Each developer gets a cloud sandbox (k8s), and we also want a local runtime (Docker on the laptop) with one-click "move" between local and cloud in both directions:
- local → cloud: already works — we
docker commit locally, push to a shared registry, and create a Sandbox from that image. agent-sandbox just schedules a Pod from it.
- cloud → local: blocked. To run the cloud sandbox locally we need a portable OCI image of its current rootfs in the registry, but there is no in-cluster mechanism to commit the running container and push it.
Why existing options don't cover it
Proposed feature
A SnapshotProvider (plugging into #694) that:
- runs an in-cluster commit of the sandbox container's rootfs (e.g. a Job on the source node using the containerd socket + a committer such as
nerdctl),
- pushes the resulting image to a configurable OCI registry,
- exposes the resulting image reference in the snapshot status,
so restore = create a Sandbox from that image (already supported), and the image can also be pulled by any other OCI runtime (Docker, another cluster).
Prior art / reference design
The OpenSandbox project implements this for its batchsandbox controller: a commit Job mounts the node containerd socket, commits the rootfs, and pushes to a configured registry. Useful reference for the mechanism and its security considerations (the commit Job has node-level runtime access → pin the committer image by digest, trusted registry / admission policy).
Notes / open questions
- Inherently privileged (node containerd access) and registry-credential-sensitive → would likely be an optional provider, off by default.
- Could the snapshot artifact create a different/new Sandbox (true portability), not only suspend/resume of the originating one?
Happy to discuss design and help with a contribution.
Refs: #694 (umbrella), #773, #538, #585, #36.
Summary
Building on #694 (Suspend API + pluggable SnapshotProvider), we'd like a
SnapshotProviderwhose artifact is a portable OCI image pushed to a configurable registry, so a sandbox's state can be restored as a new Sandbox — including on a different runtime or cluster — not only resumed in place.Use case
We're building an internal developer-sandbox platform (~800 engineers). Each developer gets a cloud sandbox (k8s), and we also want a local runtime (Docker on the laptop) with one-click "move" between local and cloud in both directions:
docker commitlocally, push to a shared registry, and create a Sandbox from that image. agent-sandbox just schedules a Pod from it.Why existing options don't cover it
Proposed feature
A
SnapshotProvider(plugging into #694) that:nerdctl),so restore = create a Sandbox from that image (already supported), and the image can also be pulled by any other OCI runtime (Docker, another cluster).
Prior art / reference design
The OpenSandbox project implements this for its
batchsandboxcontroller: a commit Job mounts the node containerd socket, commits the rootfs, and pushes to a configured registry. Useful reference for the mechanism and its security considerations (the commit Job has node-level runtime access → pin the committer image by digest, trusted registry / admission policy).Notes / open questions
Happy to discuss design and help with a contribution.
Refs: #694 (umbrella), #773, #538, #585, #36.