A complete Rook Ceph deployment on Amazon EKS demonstrating unified block, file, and object storage from a single distributed system.
- Block Storage (RBD) - Persistent volumes for databases and applications
- File Storage (CephFS) - Shared storage across multiple pods
- Object Storage (RGW) - S3-compatible API for backups and archives
- GitOps Deployment - Infrastructure as Code with ArgoCD
- Kubernetes Native - Rook operator for lifecycle management
- Scalable Design - Ready for multi-tenant applications
- Production Ready - Proper dependency management and health checks
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EKS Cluster (3 Nodes) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Rook Ceph Platform β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β MON β β MGR β β OSD β β
β β (Monitor) β β (Manager) β β (Storage) β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β
β βββββββββββββββ βββββββββββββββ β
β β MDS β β RGW β β
β β (Metadata) β β (S3 Gateway)β β
β βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
β β β
βββββββββΌβββββββ ββββββββββΌβββββββ ββββββββββΌβββββββ
β Block Storageβ βObject Storage β β File Storage β
β (RBD) β β (S3) β β (CephFS) β
β β β β β β
β StorageClass:β β StorageClass: β β StorageClass: β
βrook-ceph- β βrook-ceph- β β rook-cephfs β
βblock β βbucket β β β
ββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
- AWS CLI configured
- kubectl installed
- Terraform installed
- GitHub CLI (gh) installed
git clone https://github.com/chiju/rook-ceph-lab.git
cd rook-ceph-lab# Create S3 backend for Terraform state
./scripts/bootstrap-backend.sh
# Setup GitHub OIDC authentication
./scripts/setup-oidc-access.shCreate a dedicated GitHub App for this Rook Ceph lab:
Go to: https://github.com/settings/apps/new
Required Settings:
- Name:
ArgoCD-Rook-Ceph-Lab(dedicated to this repo) - Homepage:
https://github.com/chiju/rook-ceph-lab - Webhook: β Uncheck "Active" (we don't need webhooks)
- Repository permissions:
- Contents:
Read-only(ArgoCD needs to read your repo) - Metadata:
Read-only(automatically required)
- Contents:
- Where can this app be installed:
Only on this account
After creation:
- Generate private key β Downloads
.pemfile - Note App ID β Shown on the app page (e.g.,
2336285) - Install app β Click "Install App" β Select ONLY
rook-ceph-labrepository - Note Installation ID β From URL:
github.com/settings/installations/XXXXXXXX(e.g.,96060885)
Store GitHub App secrets:
cd ~/Downloads
gh secret set ARGOCD_APP_PRIVATE_KEY < argocd-rook-ceph-lab.*.private-key.pem
gh secret set ARGOCD_APP_ID -b "2336285"
gh secret set ARGOCD_APP_INSTALLATION_ID -b "96060885"β Dedicated GitHub App configured! This keeps the Rook Ceph lab isolated.
git add .
git commit -m "Initial Rook Ceph deployment"
git push origin mainThat's it! GitHub Actions will deploy the complete Rook Ceph platform.
# Update kubeconfig
aws eks update-kubeconfig --region eu-central-1 --name rook-ceph-lab
# Check ArgoCD applications
kubectl get applications -n argocd
# Check storage classes
kubectl get storageclass | grep ceph
# Check test results
kubectl logs -l app=ceph-comprehensive-test --tail=10| Application | Purpose | Wave |
|---|---|---|
rook-operator |
Installs Ceph operator | 0 |
ceph-cluster |
Creates storage cluster (MON/MGR/OSD) | 1 |
ceph-block-storage |
Provides block storage interface | 2 |
ceph-object-storage |
Provides S3-compatible storage | 3 |
ceph-file-storage |
Provides shared file storage | 3 |
ceph-test-apps |
Validates all storage types | 4 |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: database-storage
spec:
storageClassName: rook-ceph-block
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10GiapiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-storage
spec:
storageClassName: rook-cephfs
accessModes: [ReadWriteMany]
resources:
requests:
storage: 5GiapiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
name: backup-bucket
spec:
storageClassName: rook-ceph-bucket- MON: 1 instance (minimal for testing)
- MGR: 1 instance (minimal for testing)
- OSD: 1 instance (minimal for testing)
- Storage: ~25Gi total (EBS backend)
- Replication: 1x (no redundancy)
# Recommended production setup
mon:
count: 3 # Odd number for quorum
mgr:
count: 2 # Active/standby
storage:
storageClassDeviceSets:
- count: 6 # Multiple OSDs per node
portable: true
resources:
requests:
storage: 100Gi # Larger storage volumes- Multiple MONs for consensus and fault tolerance
- Multiple MGRs for manager failover
- Multiple OSDs with configurable replication
- Multi-AZ deployment for disaster recovery
- Databases: PostgreSQL, MySQL with persistent block storage
- Web Applications: Shared file storage for uploads and configs
- Backup Systems: S3-compatible storage for automated backups
- CI/CD: Shared build artifacts and container registries
- Monitoring: Persistent storage for Prometheus metrics
- Logging: Object storage for log archives
- Container Registry: S3 backend for Harbor or similar
- Development: Shared storage for development environments
The Ceph toolbox pod provides all admin commands for managing your cluster:
# Check cluster status
kubectl exec -n rook-ceph deploy/rook-ceph-tools -- ceph status
# Check OSD status
kubectl exec -n rook-ceph deploy/rook-ceph-tools -- ceph osd status
# Check storage usage
kubectl exec -n rook-ceph deploy/rook-ceph-tools -- ceph df
# List all pools
kubectl exec -n rook-ceph deploy/rook-ceph-tools -- ceph osd pool ls
# List realms
kubectl exec -n rook-ceph deploy/rook-ceph-tools -- radosgw-admin realm listImportant: Always specify --rgw-realm, --rgw-zonegroup, and --rgw-zone for radosgw-admin commands.
# Create S3 user
kubectl exec -n rook-ceph deploy/rook-ceph-tools -- \
radosgw-admin user create \
--uid=myuser \
--display-name="My User" \
--rgw-realm=my-store \
--rgw-zonegroup=my-store \
--rgw-zone=my-store
# List all users
kubectl exec -n rook-ceph deploy/rook-ceph-tools -- \
radosgw-admin user list \
--rgw-realm=my-store \
--rgw-zonegroup=my-store \
--rgw-zone=my-store
# Get user info (includes access/secret keys)
kubectl exec -n rook-ceph deploy/rook-ceph-tools -- \
radosgw-admin user info \
--uid=myuser \
--rgw-realm=my-store \
--rgw-zonegroup=my-store \
--rgw-zone=my-store# Overall cluster status
kubectl get cephcluster -n rook-ceph
# Component status
kubectl get pods -n rook-ceph
# Storage classes
kubectl get storageclass | grep ceph# Check PVC status
kubectl get pvc
# Check events
kubectl get events --sort-by=.metadata.creationTimestamp
# Ceph cluster details
kubectl describe cephcluster rook-ceph -n rook-ceph# Check comprehensive test results
kubectl logs -l app=ceph-comprehensive-test --tail=15
# Expected output:
# β
Block storage: X lines
# β
File storage: X lines (shared)
# β
Object storage: S3 API working# Destroy everything
./scripts/cleanup-all.shThis removes:
- β EKS cluster and all resources
- β S3 backend bucket
- β IAM roles and policies
- β GitHub secrets (except GitHub App secrets)
- β Local Terraform state
# Destroy infrastructure only
gh workflow run terraform-destroy.yml -f confirm=destroy- RADOS: Reliable Autonomic Distributed Object Store (foundation)
- RBD: RADOS Block Device (block storage interface)
- CephFS: Ceph File System (POSIX-compliant shared filesystem)
- RGW: RADOS Gateway (S3/Swift-compatible object storage)
- Custom Resources: CephCluster, CephBlockPool, CephFilesystem, CephObjectStore
- CSI Drivers: Dynamic provisioning for Kubernetes
- Lifecycle Management: Automated deployment, scaling, and updates
- Unified Backend: Single RADOS cluster serves all storage types
- Dynamic Provisioning: Kubernetes-native storage allocation
- Multi-Protocol: Block, file, and object access to same data pool
- Single storage system providing multiple interfaces
- Consistent management and monitoring
- Reduced operational complexity
- Kubernetes-native deployment and management
- GitOps-compatible configuration
- Container-optimized architecture
- Open source with no licensing costs
- Commodity hardware support
- Efficient resource utilization
This lab provides a foundation for understanding distributed storage systems. To scale for production:
- Increase replication for data redundancy
- Add monitoring with Prometheus and Grafana
- Implement backup strategies for disaster recovery
- Tune performance based on workload requirements
A complete distributed storage platform ready for real-world applications. π―