Multi-tenant isolated deployment of OpenClaw AI agents on AWS using Firecracker microVMs, also known as OpenClaw Pool. Each tenant runs in its own microVM with independent kernel, filesystem, and network. Managed via API, with auto-scaling hosts and idle reclamation.
This project uses AWS EC2 nested virtualization to run KVM + Firecracker inside EC2 instances. Currently supports Intel instance families (c8i/m8i/r8i, etc.).
β οΈ This sample is for demonstration purposes only and is not intended for production use. Deploy at your own risk.
- Tenant Management β Create/delete/query tenants via API. Each tenant is an OpenClaw instance running in an isolated Firecracker microVM with its own rootfs, data volume, and network
- Security Isolation β Firecracker microVM-based isolation: independent kernel, network, and filesystem per tenant
- Auto Scheduling β Automatically selects a host with available resources; scales out when capacity is insufficient
- Auto Scale-in β Idle hosts are reclaimed after timeout (two-round confirmation to prevent false kills)
- Health Checks β Real-time VM health monitoring with automatic status updates
- Web Console β Online management console with Cognito authentication, real-time host/tenant status
- Rootfs Pre-build β Rootfs + data template distributed via S3, downloaded on host init
- Dashboard Access β One-click HTTPS access to each tenant's OpenClaw Dashboard, no custom domain required
- Auto Backup & Restore β EventBridge scheduled backup of all tenant data volumes to S3, with manual trigger, cross-tenant backup query, and one-click restore into a new tenant (orphan-safe β source tenant need not exist)
- AgentCore Integration β Optional toggle; when enabled, all VMs auto-connect to AgentCore Gateway (MCP tool hub), Memory, Code Interpreter, and Browser
- Shared Skills β All tenants share a unified skill set (S3-managed, auto-synced to all VMs), with independent memory
- Config Templates β Custom OpenClaw configuration templates for different LLM providers/models, selectable when creating tenants
- Default Toolchain β Each VM comes with Python3/uv/git/gh/Node.js/htop/tmux/tree pre-installed
Prerequisites:
- AWS account + CLI configured
- CDK CLI + Python 3.12+
- uv (Python package manager)
# 1. Configure
cp config.yml.example config.yml # Edit infrastructure config
cp templates/openclaw.json.example templates/openclaw.json # Set your API key, model provider, etc.
# 2. Deploy infrastructure
./setup.sh ap-northeast-1 lab
# Environment variables saved to .env.deploy
# 3. Build rootfs (auto-uploads to S3 + pushes to hosts)
source .env.deploy
./build-rootfs.sh v1.0
# 4. Create a tenant (OpenClaw instance)
source .env.deploy
curl -s -X POST "${API_URL}tenants" -H "x-api-key: ${API_KEY}" \
-d '{"name":"my-agent","vcpu":2,"mem_mb":4096}' | jq .
# 5. Open Console β manage tenants, templates, and settings
# Console URL is printed after deployWeb-based console hosted on CloudFront (/console/), with Cognito authentication.
Features:
- Tenants β Host resource overview, create/delete tenants, one-click Dashboard access
- Application β Shared skills list, config template management (create/edit/delete)
- Backups β Cross-tenant backup explorer with per-tenant grouping, orphan filter, and one-click restore into a new tenant
- Settings β API connection, AgentCore status, system info
Each tenant's OpenClaw Dashboard is accessible via CloudFront + ALB + Nginx reverse proxy:
https://{cloudfront-domain}/vm/{tenant-id}/ β Tenant Dashboard (WebSocket)
HTTPS is provided by CloudFront out of the box β no custom domain or ACM certificate required. The Console's "Open Dashboard" button includes the gateway token for one-click access.
Traffic flow: Browser β CloudFront:443 β ALB:80 β Host Nginx:80 β VM Gateway:18789
Nginx config is automatically managed by launch-vm.sh / stop-vm.sh.
Bind a custom domain + HTTPS to CloudFront. Configuration lives in config.yml under cloudfront:; you can edit the file directly or pass flags to setup.sh:
# Prerequisites:
# 1. Request an ACM certificate in us-east-1 (required by CloudFront) and complete DNS validation
# 2. CNAME your domain to the CloudFront domain (see DashboardUrl output)
# One-liner: sets config.yml + deploys in a single run
./setup.sh ap-northeast-1 lab \
--domain claw.example.com \
--cert arn:aws:acm:us-east-1:xxx:certificate/xxx
# Or edit config.yml manually then run setup.sh with no flags.
# To unbind the custom domain: --domain "" and re-run setup.sh.The custom domain and certificate flow through CDK (not out-of-band), so subsequent setup.sh runs preserve the binding.
EventBridge schedules daily backups of all running tenant data volumes to S3. Manual trigger also supported.
Backup flow: pause VM β pigz compress data.ext4 β resume VM β upload to S3. VM auto-resumes even on failure (trap cleanup).
source .env.deploy
# Manual backup (async, returns 202)
curl -s -X POST "${API_URL}tenants/{id}/backup" -H "x-api-key: ${API_KEY}" | jq .
# List backups for one tenant
curl -s "${API_URL}tenants/{id}/backups" -H "x-api-key: ${API_KEY}" | jq .
# List all backups across all tenants (marks orphan vs active)
curl -s "${API_URL}backups" -H "x-api-key: ${API_KEY}" | jq .
# Config (config.yml):
# backup_cron: "cron(0 19 * * ? *)" # UTC 19:00 = Beijing 03:00
# backup_retention_days: 7 # S3 lifecycle auto-cleanupBackups stored at s3://{bucket}/backups/{tenant-id}/{timestamp}.gz.
Restore creates a new tenant using a backup's data volume. The source tenant does not need to exist β orphan backups from deleted tenants are fully restorable.
# Restore from the latest backup of a (possibly deleted) tenant
curl -s -X POST "${API_URL}tenants" -H "x-api-key: ${API_KEY}" -d '{
"name": "restored-agent",
"vcpu": 2, "mem_mb": 4096,
"restore_from": {"tenant_id": "my-agent-ab12"}
}' | jq .
# Restore from a specific backup timestamp
curl -s -X POST "${API_URL}tenants" -H "x-api-key: ${API_KEY}" -d '{
"name": "restored-agent",
"restore_from": {"tenant_id": "my-agent-ab12", "timestamp": "20260428-125402"}
}' | jq .restore_fromis decoupled fromvcpu/mem_mb/config_templateβ those follow the new tenant's spec- Data volume size equals the backup's actual size (no resize)
- The new tenant gets a fresh ID; the source's identity is not inherited
All tenants share a unified skill set (SKILL.md files), with independent memory per tenant.
# Upload skills to S3 (auto-synced to all VMs)
aws s3 sync ./my-skills/ s3://${ASSETS_BUCKET}/skills/ --profile $PROFILE
# Sync chain:
# S3 β Host /data/shared-skills/ (cron 5min) β All running VMs
# New VMs get skills injected into data volume at launchScale-out β No available host when creating a tenant β tenant enters pending β ASG launches new instance β pending tenants auto-assigned after init
Scale-in β Scaler Lambda checks every 5 minutes:
- Host with
vm_count=0exceedingidle_timeout_minutesβ markedidle - Next round confirms still idle and ASG instances > min β terminate
- If a tenant is assigned during this window β auto-recover to
active
| File | Purpose |
|---|---|
config.yml |
Infrastructure config β copy from config.yml.example and customize |
templates/openclaw.json |
OpenClaw app config (model, API key, provider) β copy from .example |
.env.deploy |
Deploy environment (region, API URL/Key, bucket) β auto-generated by setup.sh |
| Section | Key | Default | Description |
|---|---|---|---|
| host | instance_type | m8i.2xlarge | Must support NestedVirtualization (c8i/m8i/r8i) |
| host | data_volume_gb | 200 | Data volume for rootfs templates + VM disks |
| host | cpu_overcommit_ratio | 2.0 | CPU overcommit (1.0=none, 2.0=allocate 2x vCPU) |
| host | mem_overcommit_ratio | 1.0 | Memory overcommit (requires balloon enabled) |
| host | keep_data_volume | true | Keep EBS data volume after instance termination |
| vm | default_vcpu | 2 | Default vCPU per tenant |
| vm | default_mem_mb | 4096 | Default memory (MB) per tenant |
| vm | rootfs_overlay_mb | 8192 | Per-VM writable rootfs layer cap (sparse, doesn't pre-allocate) |
| vm | data_disk_mb | 8192 | Per-VM data volume /home/agent cap (sparse) |
| balloon | enabled | false | Firecracker balloon device for memory overcommit |
| balloon | max_inflate_ratio | 0.4 | Max reclaimable ratio of VM declared memory |
| balloon | min_guest_available_mb | 512 | Min available memory kept in guest |
| asg | min_capacity | 1 | Minimum host instances |
| asg | max_capacity | 5 | Maximum host instances |
| asg | use_spot | false | Spot instances (save ~60-70%, may be reclaimed) |
| scaler | idle_timeout_minutes | 10 | Idle host reclaim timeout |
| health_check | interval_minutes | 5 | Lambda watchdog interval |
| agentcore | enabled | false | AgentCore Gateway/Memory/CodeInterpreter/Browser |
| console_auth | enabled | false | Cognito authentication for Console |
| console_auth | self_sign_up | false | Allow user self-registration |
See config.yml.example for all options. Redeploy after changes: ./setup.sh <region> <profile>
./scripts/destroy.sh # Destroy stack, keep S3 bucket and DynamoDB tables
./scripts/destroy.sh --purge # Full cleanup including S3 data and DynamoDB tablesAdmin / User
β
βββ API Gateway (HTTPS, x-api-key) βββ Lambda βββ DynamoDB
β βββ tenants
β βββ hosts
β
βββ ALB (HTTPS) βββ Host Nginx:80 βββ VM Gateway:18789
βββ /vm/{tenant-a}/ β 172.16.1.2
βββ /vm/{tenant-b}/ β 172.16.2.2
Lambda ββ SSM Run Command βββ EC2 Host
βββ microVM 01 (172.16.1.2)
βββ microVM 02 (172.16.2.2)
βββ ...
S3: rootfs distribution + data backup + shared skills
ASG: auto-scaling hosts
EventBridge: health checks + idle reclamation + scheduled backup
sample-multi-tenant-openclaw-on-firecracker/
βββ deploy/ # CDK project
β βββ app.py # CDK app entry
β βββ stack.py # Infrastructure definition
β βββ lambda/
β β βββ api/handler.py # Tenant CRUD + host management
β β βββ templates/handler.py # Config template CRUD
β β βββ skills/handler.py # Shared skills list
β β βββ health_check/handler.py # Scheduled health checks
β β βββ agentcore_tools/handler.py # AgentCore Gateway Lambda tools
β β βββ scaler/handler.py # Idle host reclamation
β βββ userdata/
β βββ init-host.sh # Host initialization
β βββ host-agent.py # VM health polling + DDB writes + balloon
β βββ launch-vm.sh # microVM launch
β βββ stop-vm.sh # microVM stop
βββ console/ # Web management console
β βββ index.html # Alpine.js SPA (4 tabs)
β βββ style.css
βββ tests/ # Test suite (unit + e2e)
βββ templates/ # OpenClaw config templates
β βββ openclaw.json.example # Example config
βββ pyproject.toml # Python project config + dependencies
βββ cdk.json # CDK app config + feature flags
βββ config.yml # Infrastructure config (single source of truth)
βββ setup.sh # One-click deploy + export .env.deploy
βββ build-rootfs.sh # Build rootfs + data template, upload to S3
βββ scripts/
β βββ destroy.sh # Tear down stack
β βββ oc-connect.sh # SSH-style helper to reach a tenant VM
β βββ oc-dashboard.sh # Open a tenant's Dashboard URL
βββ docs/
All requests require x-api-key header.
| Method | Path | Description |
|---|---|---|
| GET | /tenants | List all tenants |
| POST | /tenants | Create tenant {"name":"xx","vcpu":2,"mem_mb":4096} β add "restore_from":{"tenant_id":"..."} to restore from a backup |
| GET | /tenants/{id} | Get tenant details |
| DELETE | /tenants/{id} | Delete tenant (?keep_data=true to preserve data volume) |
| POST | /tenants/{id}/restart | Restart VM (reuse disks, fast) |
| POST | /tenants/{id}/stop | Stop VM (disks preserved) |
| POST | /tenants/{id}/start | Start a stopped VM |
| POST | /tenants/{id}/pause | Freeze vCPU (Firecracker native, instant) |
| POST | /tenants/{id}/resume | Resume a paused VM |
| POST | /tenants/{id}/reset | Reinstall rootfs (data volume preserved) |
| POST | /tenants/{id}/backup | Manual data backup (async, returns 202) |
| GET | /tenants/{id}/backups | List backups for one tenant |
| GET | /backups | List all backups across tenants (includes orphan flag) |
| GET | /hosts | List all hosts |
| POST | /hosts | Register host (called by UserData) |
| POST | /hosts/refresh-rootfs | Push latest rootfs to all hosts |
| GET | /hosts/rootfs-version | Query current rootfs version (manifest.json) |
| DELETE | /hosts/{id} | Deregister host |
Each VM uses an independent /24 subnet, communicating with the host via TAP device:
VM1: tap-vm1 host=172.16.1.1/24 guest=172.16.1.2/24
VM2: tap-vm2 host=172.16.2.1/24 guest=172.16.2.2/24
VMn: tap-vmN host=172.16.N.1/24 guest=172.16.N.2/24
- Outbound: iptables MASQUERADE β internet
- Inbound: ALB β Nginx reverse proxy β VM:18789
- Inter-VM: fully isolated, no routing between subnets
The build script produces two images: rootfs (OS + software) and data template (/home/agent pre-configured content).
Versions managed via S3 manifest.json. Hosts and tenants track their rootfs_version.
# Build and upload (updates manifest.json + refreshes hosts)
./build-rootfs.sh v1.8
# Manually refresh host images
source .env.deploy
curl -s -X POST "${API_URL}hosts/refresh-rootfs" -H "x-api-key: ${API_KEY}" | jq .
# Query current version
curl -s "${API_URL}hosts/rootfs-version" -H "x-api-key: ${API_KEY}" | jq .
# New VMs use the latest version; existing VMs need reset to updateSee CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.



