Multi-Tenant OpenClaw on Firecracker

Multi-tenant isolated deployment of OpenClaw AI agents on AWS using Firecracker microVMs, also known as OpenClaw Pool. Each tenant runs in its own microVM with independent kernel, filesystem, and network. Managed via API, with auto-scaling hosts and idle reclamation.

This project uses AWS EC2 nested virtualization to run KVM + Firecracker inside EC2 instances. Currently supports Intel instance families (c8i/m8i/r8i, etc.).

⚠️ This sample is for demonstration purposes only and is not intended for production use. Deploy at your own risk.

Features

Tenant Management — Create/delete/query tenants via API. Each tenant is an OpenClaw instance running in an isolated Firecracker microVM with its own rootfs, data volume, and network
Security Isolation — Firecracker microVM-based isolation: independent kernel, network, and filesystem per tenant
Auto Scheduling — Automatically selects a host with available resources; scales out when capacity is insufficient
Auto Scale-in — Idle hosts are reclaimed after timeout (two-round confirmation to prevent false kills)
Health Checks — Real-time VM health monitoring with automatic status updates
Web Console — Online management console with Cognito authentication, real-time host/tenant status
Rootfs Pre-build — Rootfs + data template distributed via S3, downloaded on host init
Dashboard Access — One-click HTTPS access to each tenant's OpenClaw Dashboard, no custom domain required
Auto Backup & Restore — EventBridge scheduled backup of all tenant data volumes to S3, with manual trigger, cross-tenant backup query, and one-click restore into a new tenant (orphan-safe — source tenant need not exist)
AgentCore Integration — Optional toggle; when enabled, all VMs auto-connect to AgentCore Gateway (MCP tool hub), Memory, Code Interpreter, and Browser
Shared Skills — All tenants share a unified skill set (S3-managed, auto-synced to all VMs), with independent memory
Config Templates — Custom OpenClaw configuration templates for different LLM providers/models, selectable when creating tenants
Default Toolchain — Each VM comes with Python3/uv/git/gh/Node.js/htop/tmux/tree pre-installed

Quick Start

Prerequisites:

AWS account + CLI configured
CDK CLI + Python 3.12+
uv (Python package manager)

# 1. Configure
cp config.yml.example config.yml          # Edit infrastructure config
cp templates/openclaw.json.example templates/openclaw.json  # Set your API key, model provider, etc.

# 2. Deploy infrastructure
./setup.sh ap-northeast-1 lab
# Environment variables saved to .env.deploy

# 3. Build rootfs (auto-uploads to S3 + pushes to hosts)
source .env.deploy
./build-rootfs.sh v1.0

# 4. Create a tenant (OpenClaw instance)
source .env.deploy
curl -s -X POST "${API_URL}tenants" -H "x-api-key: ${API_KEY}" \
  -d '{"name":"my-agent","vcpu":2,"mem_mb":4096}' | jq .

# 5. Open Console — manage tenants, templates, and settings
# Console URL is printed after deploy

Management Console

Web-based console hosted on CloudFront (/console/), with Cognito authentication.

Features:

Tenants — Host resource overview, create/delete tenants, one-click Dashboard access
Application — Shared skills list, config template management (create/edit/delete)
Backups — Cross-tenant backup explorer with per-tenant grouping, orphan filter, and one-click restore into a new tenant
Settings — API connection, AgentCore status, system info

Screenshots

Dashboard Access

Each tenant's OpenClaw Dashboard is accessible via CloudFront + ALB + Nginx reverse proxy:

https://{cloudfront-domain}/vm/{tenant-id}/    → Tenant Dashboard (WebSocket)

HTTPS is provided by CloudFront out of the box — no custom domain or ACM certificate required. The Console's "Open Dashboard" button includes the gateway token for one-click access.

Traffic flow: Browser → CloudFront:443 → ALB:80 → Host Nginx:80 → VM Gateway:18789

Nginx config is automatically managed by launch-vm.sh / stop-vm.sh.

Custom Domain (Optional)

Bind a custom domain + HTTPS to CloudFront. Configuration lives in config.yml under cloudfront:; you can edit the file directly or pass flags to setup.sh:

# Prerequisites:
# 1. Request an ACM certificate in us-east-1 (required by CloudFront) and complete DNS validation
# 2. CNAME your domain to the CloudFront domain (see DashboardUrl output)

# One-liner: sets config.yml + deploys in a single run
./setup.sh ap-northeast-1 lab \
  --domain claw.example.com \
  --cert   arn:aws:acm:us-east-1:xxx:certificate/xxx

# Or edit config.yml manually then run setup.sh with no flags.
# To unbind the custom domain: --domain "" and re-run setup.sh.

The custom domain and certificate flow through CDK (not out-of-band), so subsequent setup.sh runs preserve the binding.

Auto Backup & Restore

EventBridge schedules daily backups of all running tenant data volumes to S3. Manual trigger also supported.

Backup flow: pause VM → pigz compress data.ext4 → resume VM → upload to S3. VM auto-resumes even on failure (trap cleanup).

source .env.deploy

# Manual backup (async, returns 202)
curl -s -X POST "${API_URL}tenants/{id}/backup" -H "x-api-key: ${API_KEY}" | jq .

# List backups for one tenant
curl -s "${API_URL}tenants/{id}/backups" -H "x-api-key: ${API_KEY}" | jq .

# List all backups across all tenants (marks orphan vs active)
curl -s "${API_URL}backups" -H "x-api-key: ${API_KEY}" | jq .

# Config (config.yml):
# backup_cron: "cron(0 19 * * ? *)"  # UTC 19:00 = Beijing 03:00
# backup_retention_days: 7            # S3 lifecycle auto-cleanup

Backups stored at s3://{bucket}/backups/{tenant-id}/{timestamp}.gz.

Restore from Backup

Restore creates a new tenant using a backup's data volume. The source tenant does not need to exist — orphan backups from deleted tenants are fully restorable.

# Restore from the latest backup of a (possibly deleted) tenant
curl -s -X POST "${API_URL}tenants" -H "x-api-key: ${API_KEY}" -d '{
  "name": "restored-agent",
  "vcpu": 2, "mem_mb": 4096,
  "restore_from": {"tenant_id": "my-agent-ab12"}
}' | jq .

# Restore from a specific backup timestamp
curl -s -X POST "${API_URL}tenants" -H "x-api-key: ${API_KEY}" -d '{
  "name": "restored-agent",
  "restore_from": {"tenant_id": "my-agent-ab12", "timestamp": "20260428-125402"}
}' | jq .

restore_from is decoupled from vcpu/mem_mb/config_template — those follow the new tenant's spec
Data volume size equals the backup's actual size (no resize)
The new tenant gets a fresh ID; the source's identity is not inherited

Shared Skills

All tenants share a unified skill set (SKILL.md files), with independent memory per tenant.

# Upload skills to S3 (auto-synced to all VMs)
aws s3 sync ./my-skills/ s3://${ASSETS_BUCKET}/skills/ --profile $PROFILE

# Sync chain:
# S3 → Host /data/shared-skills/ (cron 5min) → All running VMs
# New VMs get skills injected into data volume at launch

Auto Scaling

Scale-out — No available host when creating a tenant → tenant enters pending → ASG launches new instance → pending tenants auto-assigned after init

Scale-in — Scaler Lambda checks every 5 minutes:

Host with vm_count=0 exceeding idle_timeout_minutes → marked idle
Next round confirms still idle and ASG instances > min → terminate
If a tenant is assigned during this window → auto-recover to active

Configuration

Config Files

File	Purpose
`config.yml`	Infrastructure config — copy from `config.yml.example` and customize
`templates/openclaw.json`	OpenClaw app config (model, API key, provider) — copy from `.example`
`.env.deploy`	Deploy environment (region, API URL/Key, bucket) — auto-generated by setup.sh

config.yml

Section	Key	Default	Description
host	instance_type	m8i.2xlarge	Must support NestedVirtualization (c8i/m8i/r8i)
host	data_volume_gb	200	Data volume for rootfs templates + VM disks
host	cpu_overcommit_ratio	2.0	CPU overcommit (1.0=none, 2.0=allocate 2x vCPU)
host	mem_overcommit_ratio	1.0	Memory overcommit (requires balloon enabled)
host	keep_data_volume	true	Keep EBS data volume after instance termination
vm	default_vcpu	2	Default vCPU per tenant
vm	default_mem_mb	4096	Default memory (MB) per tenant
vm	rootfs_overlay_mb	8192	Per-VM writable rootfs layer cap (sparse, doesn't pre-allocate)
vm	data_disk_mb	8192	Per-VM data volume `/home/agent` cap (sparse)
balloon	enabled	false	Firecracker balloon device for memory overcommit
balloon	max_inflate_ratio	0.4	Max reclaimable ratio of VM declared memory
balloon	min_guest_available_mb	512	Min available memory kept in guest
asg	min_capacity	1	Minimum host instances
asg	max_capacity	5	Maximum host instances
asg	use_spot	false	Spot instances (save ~60-70%, may be reclaimed)
scaler	idle_timeout_minutes	10	Idle host reclaim timeout
health_check	interval_minutes	5	Lambda watchdog interval
agentcore	enabled	false	AgentCore Gateway/Memory/CodeInterpreter/Browser
console_auth	enabled	false	Cognito authentication for Console
console_auth	self_sign_up	false	Allow user self-registration

See config.yml.example for all options. Redeploy after changes: ./setup.sh <region> <profile>

Tear Down

./scripts/destroy.sh           # Destroy stack, keep S3 bucket and DynamoDB tables
./scripts/destroy.sh --purge   # Full cleanup including S3 data and DynamoDB tables

Architecture

Admin / User
    │
    ├── API Gateway (HTTPS, x-api-key) ──→ Lambda ──→ DynamoDB
    │                                                  ├── tenants
    │                                                  └── hosts
    │
    └── ALB (HTTPS) ──→ Host Nginx:80 ──→ VM Gateway:18789
                        ├── /vm/{tenant-a}/ → 172.16.1.2
                        └── /vm/{tenant-b}/ → 172.16.2.2

Lambda ── SSM Run Command ──→ EC2 Host
                               ├── microVM 01 (172.16.1.2)
                               ├── microVM 02 (172.16.2.2)
                               └── ...

S3: rootfs distribution + data backup + shared skills
ASG: auto-scaling hosts
EventBridge: health checks + idle reclamation + scheduled backup

System Architecture

Deployment Architecture

Project Structure

sample-multi-tenant-openclaw-on-firecracker/
├── deploy/                    # CDK project
│   ├── app.py                 # CDK app entry
│   ├── stack.py               # Infrastructure definition
│   ├── lambda/
│   │   ├── api/handler.py     # Tenant CRUD + host management
│   │   ├── templates/handler.py  # Config template CRUD
│   │   ├── skills/handler.py  # Shared skills list
│   │   ├── health_check/handler.py  # Scheduled health checks
│   │   ├── agentcore_tools/handler.py  # AgentCore Gateway Lambda tools
│   │   └── scaler/handler.py  # Idle host reclamation
│   └── userdata/
│       ├── init-host.sh       # Host initialization
│       ├── host-agent.py      # VM health polling + DDB writes + balloon
│       ├── launch-vm.sh       # microVM launch
│       └── stop-vm.sh         # microVM stop
├── console/                   # Web management console
│   ├── index.html             # Alpine.js SPA (4 tabs)
│   └── style.css
├── tests/                     # Test suite (unit + e2e)
├── templates/                 # OpenClaw config templates
│   └── openclaw.json.example  # Example config
├── pyproject.toml             # Python project config + dependencies
├── cdk.json                   # CDK app config + feature flags
├── config.yml                 # Infrastructure config (single source of truth)
├── setup.sh                   # One-click deploy + export .env.deploy
├── build-rootfs.sh            # Build rootfs + data template, upload to S3
├── scripts/
│   ├── destroy.sh             # Tear down stack
│   ├── oc-connect.sh          # SSH-style helper to reach a tenant VM
│   └── oc-dashboard.sh        # Open a tenant's Dashboard URL
└── docs/

API Reference

All requests require x-api-key header.

Method	Path	Description
GET	/tenants	List all tenants
POST	/tenants	Create tenant `{"name":"xx","vcpu":2,"mem_mb":4096}` — add `"restore_from":{"tenant_id":"..."}` to restore from a backup
GET	/tenants/{id}	Get tenant details
DELETE	/tenants/{id}	Delete tenant (`?keep_data=true` to preserve data volume)
POST	/tenants/{id}/restart	Restart VM (reuse disks, fast)
POST	/tenants/{id}/stop	Stop VM (disks preserved)
POST	/tenants/{id}/start	Start a stopped VM
POST	/tenants/{id}/pause	Freeze vCPU (Firecracker native, instant)
POST	/tenants/{id}/resume	Resume a paused VM
POST	/tenants/{id}/reset	Reinstall rootfs (data volume preserved)
POST	/tenants/{id}/backup	Manual data backup (async, returns 202)
GET	/tenants/{id}/backups	List backups for one tenant
GET	/backups	List all backups across tenants (includes orphan flag)
GET	/hosts	List all hosts
POST	/hosts	Register host (called by UserData)
POST	/hosts/refresh-rootfs	Push latest rootfs to all hosts
GET	/hosts/rootfs-version	Query current rootfs version (manifest.json)
DELETE	/hosts/{id}	Deregister host

Network Model

Each VM uses an independent /24 subnet, communicating with the host via TAP device:

VM1: tap-vm1  host=172.16.1.1/24  guest=172.16.1.2/24
VM2: tap-vm2  host=172.16.2.1/24  guest=172.16.2.2/24
VMn: tap-vmN  host=172.16.N.1/24  guest=172.16.N.2/24

Outbound: iptables MASQUERADE → internet
Inbound: ALB → Nginx reverse proxy → VM:18789
Inter-VM: fully isolated, no routing between subnets

Rootfs Management

The build script produces two images: rootfs (OS + software) and data template (/home/agent pre-configured content).

Versions managed via S3 manifest.json. Hosts and tenants track their rootfs_version.

# Build and upload (updates manifest.json + refreshes hosts)
./build-rootfs.sh v1.8

# Manually refresh host images
source .env.deploy
curl -s -X POST "${API_URL}hosts/refresh-rootfs" -H "x-api-key: ${API_KEY}" | jq .

# Query current version
curl -s "${API_URL}hosts/rootfs-version" -H "x-api-key: ${API_KEY}" | jq .

# New VMs use the latest version; existing VMs need reset to update

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Tenant OpenClaw on Firecracker

Features

Quick Start

Management Console

Screenshots

Dashboard Access

Custom Domain (Optional)

Auto Backup & Restore

Restore from Backup

Shared Skills

Auto Scaling

Configuration

Config Files

config.yml

Tear Down

Architecture

Project Structure

API Reference

Network Model

Rootfs Management

Security

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
console		console
deploy		deploy
docs		docs
scripts		scripts
templates		templates
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build-rootfs.sh		build-rootfs.sh
cdk.json		cdk.json
config.yml.example		config.yml.example
pyproject.toml		pyproject.toml
setup.sh		setup.sh
uv.lock		uv.lock
web-console.sh		web-console.sh

Folders and files

Latest commit

History

Repository files navigation

Multi-Tenant OpenClaw on Firecracker

Features

Quick Start

Management Console

Screenshots

Dashboard Access

Custom Domain (Optional)

Auto Backup & Restore

Restore from Backup

Shared Skills

Auto Scaling

Configuration

Config Files

config.yml

Tear Down

Architecture

Project Structure

API Reference

Network Model

Rootfs Management

Security

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages