Production-ready Amazon EKS infrastructure with GitOps using ArgoCD, fully automated via GitHub Actions and Terraform.
This project demonstrates a complete GitOps workflow from zero to a fully automated Kubernetes cluster:
- Bootstrap → Create S3 backend for Terraform state
- Setup → Configure IAM role with OIDC authentication
- Deploy → Push to GitHub, infrastructure deploys automatically
- GitOps → ArgoCD syncs applications from Git every 30 seconds
- Scale → Karpenter autoscales nodes, KEDA autoscales pods
- Monitor → Prometheus + Grafana for metrics, Loki for logs
- Cleanup → One command destroys everything
Total setup time: ~20 minutes (mostly waiting for EKS cluster)
Manual steps: Only 3 (bootstrap, OIDC, GitHub App)
Everything else: Fully automated via GitHub Actions and ArgoCD
- EKS Cluster: Kubernetes 1.34 with managed node groups (2 t3.medium nodes)
- Networking: VPC with public/private subnets across 2 AZs
- Storage: EBS-backed persistent volumes
- Autoscaling: Karpenter for intelligent node scaling
- ArgoCD: Automated application deployment with app-of-apps pattern
- GitHub Actions: OIDC-based CI/CD pipeline
- Terraform: Infrastructure as Code with S3 remote state
- nginx: Web server with KEDA autoscaling
- KEDA: Event-driven pod autoscaling (CPU/Memory triggers)
- Karpenter: Intelligent node autoscaling and bin-packing
- Prometheus Stack: Metrics collection with persistent storage (15 days retention)
- Grafana: Metrics visualization with CloudWatch integration and persistent dashboards
- Loki: Log aggregation backend
- Promtail: Log collection from all pods
- Event Exporter: Kubernetes events to Loki for Grafana visualization
- HashiCorp Vault: Centralized secrets management with audit logging
- Secrets Store CSI Driver: Kubernetes-native secret injection (no sidecars!)
- Vault CSI Provider: Direct integration between Vault and Kubernetes pods
- Demo Apps: Working examples showing Vault integration patterns
- ACK EKS Controller: Manages EKS resources via Kubernetes CRDs
- Access Entries: Automatically created from SSO roles
- GitOps-native: Self-healing access management
- ✅ AWS OIDC: No stored credentials
- ✅ Federated Authentication: GitHub Actions authenticates via OIDC
- ✅ IAM Identity Center: SSO with multiple users and permission sets
- ✅ ACK EKS Controller: Automatic AccessEntry creation from SSO roles
- ✅ RBAC: Role-based access control with namespace isolation
- ✅ IAM Roles: Least privilege access for all services
- ✅ IRSA: IAM Roles for Service Accounts (Karpenter, Grafana)
- ✅ Encrypted State: S3 backend with encryption at rest
- ✅ No Secrets in Code: All sensitive data in GitHub Secrets
- ✅ Branch Protection: PRs required via workflow concurrency
- ✅ State Locking: Native S3 locking prevents concurrent modifications
- ✅ Checkov: IaC security scanning in CI/CD pipeline
- ✅ Terraform Validation: Format and validation checks
- ℹ️ Note: Checkov chosen for deep Terraform analysis
- AWS CLI configured (
aws configure) - GitHub CLI (
gh auth login) - Terraform (v1.13.5+)
- kubectl
- Git
./scripts/bootstrap-backend.shWhat it does:
- Creates S3 bucket for Terraform state (with versioning & encryption)
- Uses native S3 locking (no DynamoDB needed)
- Automatically updates
terraform/backend.tfwith bucket name
Output:
✅ Backend created successfully!
✅ Updated terraform/backend.tf automatically!
./scripts/setup-oidc-access.shWhat it does:
- Creates GitHub OIDC provider in AWS (if not exists)
- Creates IAM role for GitHub Actions
- Configures federated credentials
- Automatically adds 3 GitHub secrets
Output:
✅ OIDC setup complete!
✅ GitHub secrets added!
If you don't have a GitHub App yet, create one:
Go to: https://github.com/settings/apps/new
Required Settings:
- Name:
ArgoCD-EKS-GitOps(or any name) - Homepage:
https://github.com/YOUR_USERNAME/eks-gitops-lab - Webhook: ✅ Uncheck "Active" (we don't need webhooks)
- Repository permissions:
- Contents:
Read-only(ArgoCD needs to read your repo) - Metadata:
Read-only(automatically required)
- Contents:
- Where can this app be installed:
Only on this account
After creation:
- Generate private key → Downloads
.pemfile - Note App ID → Shown on the app page
- Install app → Click "Install App" → Select
eks-gitops-labrepository - Note Installation ID → From URL:
github.com/settings/installations/XXXXXXXX
Store GitHub App secrets:
cd ~/Downloads
gh secret set ARGOCD_APP_PRIVATE_KEY < argocd-eks-gitops.*.private-key.pem
gh secret set ARGOCD_APP_ID -b "YOUR_APP_ID"
gh secret set ARGOCD_APP_INSTALLATION_ID -b "YOUR_INSTALLATION_ID"✅ GitHub App configured! This is reusable for future deployments.
git add .
git commit -m "Initial deployment"
git push origin mainThat's it! GitHub Actions will:
- Run terraform plan (security scan)
- Deploy EKS cluster (~15 minutes)
- Install ArgoCD
- Update app configs with cluster info
- Deploy all applications automatically
┌─────────────────────────────────────────────────────────────┐
│ AWS Cloud │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ VPC (10.0.0.0/16) │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ Public Subnet │ │ Public Subnet │ │ │
│ │ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │ │
│ │ │ (AZ-1) │ │ (AZ-2) │ │ │
│ │ │ - NAT Gateway │ │ │ │ │
│ │ └──────────────────┘ └──────────────────┘ │ │
│ │ │ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ Private Subnet │ │ Private Subnet │ │ │
│ │ │ 10.0.37.0/24 │ │ 10.0.60.0/24 │ │ │
│ │ │ (AZ-1) │ │ (AZ-2) │ │ │
│ │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │
│ │ │ │ EKS Nodes │ │ │ │ EKS Nodes │ │ │ │
│ │ │ │ t3.medium │ │ │ │ t3.medium │ │ │ │
│ │ │ └──────────────┘ │ │ └──────────────┘ │ │ │
│ │ └──────────────────┘ └──────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Developer → PR → Plan → Review → Merge → Apply → Update Configs → ArgoCD Syncs
┌──────────────────────────────────────────────────────────────┐
│ ArgoCD │
├──────────────────────────────────────────────────────────────┤
│ │
│ core-apps (App of Apps) │
│ ├─ Monitors: argocd-apps/ directory │
│ ├─ Auto-sync: Every 30 seconds │
│ └─ Auto-prune: Removes deleted apps │
│ │
│ Applications │
│ ├─ nginx (with KEDA autoscaling) │
│ ├─ keda (pod autoscaling controller) │
│ ├─ karpenter (node autoscaling) │
│ ├─ kube-prometheus-stack (monitoring) │
│ ├─ loki (log aggregation) │
│ └─ promtail (log collection) │
│ │
└──────────────────────────────────────────────────────────────┘
.
├── .github/workflows/
│ ├── terraform.yml # Main CI/CD pipeline
│ ├── terraform-destroy.yml # Infrastructure cleanup
│ └── update-app-values.yml # Update configs from Terraform
├── apps/ # Helm charts for applications
│ ├── nginx/
│ ├── keda/
│ ├── karpenter/
│ ├── kube-prometheus-stack/
│ ├── loki/
│ ├── promtail/
│ ├── event-exporter/ # Kubernetes events to Loki
│ ├── secrets-store-csi/ # CSI driver for secrets
│ ├── vault/ # HashiCorp Vault
│ ├── vault-demo/ # Vault integration demo
│ ├── myapp/ # Example app with Vault
│ ├── ack-eks-controller/ # ACK EKS controller
│ ├── access-entries/ # EKS access entries via ACK
│ └── rbac-setup/ # RBAC roles and bindings
├── argocd-apps/ # ArgoCD application definitions
│ ├── nginx.yaml
│ ├── keda.yaml
│ ├── karpenter.yaml
│ ├── kube-prometheus-stack.yaml
│ ├── loki.yaml
│ ├── promtail.yaml
│ ├── event-exporter.yaml
│ ├── ack-eks-controller.yaml
│ ├── access-entries.yaml
│ └── rbac-setup.yaml
├── terraform/ # Terraform infrastructure
│ ├── modules/
│ │ ├── aks/ # EKS cluster configuration
│ │ ├── argocd/ # ArgoCD Helm deployment
│ │ └── vpc/ # Virtual network
│ ├── backend.tf # Terraform backend configuration
│ ├── main.tf # Main Terraform configuration
│ ├── variables.tf # Variable definitions
│ ├── outputs.tf # Output definitions
│ └── provider.tf # Provider configuration
├── scripts/ # Automation scripts
│ ├── bootstrap-backend.sh
│ ├── setup-oidc-access.sh
│ └── cleanup-all.sh
└── README.md
# Get credentials
aws eks update-kubeconfig --name eks-gitops-lab --region eu-central-1
# Check cluster
kubectl get nodes
kubectl get pods --all-namespaces# Port forward
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get password
kubectl get secret argocd-initial-admin-secret -n argocd -o jsonpath="{.data.password}" | base64 -d
# Open browser
open https://localhost:8080
# Username: admin
# Password: (from above command)# Port forward
kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80
# Get password
kubectl get secret kube-prometheus-stack-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d
# Open browser
open http://localhost:3000
# Username: admin
# Password: (from above command)kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090
open http://localhost:9090# Configure SSO profile
aws configure sso
# SSO start URL: https://d-99675f4fc7.awsapps.com/start
# SSO Region: eu-central-1
# Account: 432801802107
# Role: EKSDeveloper / EKSDevOps / EKSReadOnly
# Login
aws sso login --profile <profile-name>
# Access EKS
aws eks update-kubeconfig --name eks-gitops-lab --region eu-central-1 --profile <profile-name>
kubectl get pods -n dev # Developer access
kubectl get nodes # DevOps accessUser Roles:
- EKSDeveloper: Full access to
devnamespace only - EKSDevOps: Full cluster access (all namespaces, nodes)
- EKSReadOnly: Read-only access to all namespaces
./scripts/cleanup-all.shThis removes:
- ✅ IAM role
- ✅ S3 bucket and all objects
- ✅ GitHub secrets
- ✅ Local Terraform state files
# Destroy infrastructure only (manual trigger required)
gh workflow run terraform-destroy.yml -f confirm=destroySolution: The IAM role needs proper permissions. Check:
aws iam get-role --role-name GitHubActionsEKSRolePossible causes:
- GitHub token expired
- Repository URL incorrect
- Branch name mismatch
Solution:
# Check ArgoCD repo secret
kubectl get secret argocd-repo -n argocd -o yaml
# Update if needed
kubectl delete secret argocd-repo -n argocd
# Re-run update-app-values workflow
gh workflow run update-app-values.ymlSolution: Check if Karpenter has correct cluster info:
# Manually trigger update workflow
gh workflow run update-app-values.yml
# Verify Karpenter config
kubectl get ec2nodeclass -o yamlSolution: Karpenter will automatically provision nodes. Check:
# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter
# Check pending pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending- Node metrics: CPU, memory, disk, network
- Pod metrics: Resource usage per pod
- Cluster metrics: Overall cluster health
- CloudWatch integration: Grafana can query CloudWatch
- Centralized logging: All pod logs in one place
- Query language: LogQL for powerful log queries
- Retention: Configurable log retention policies
- Integration: Grafana dashboards for log visualization
- Event collection: All K8s events sent to Loki
- Grafana visualization: View events in Grafana Explore
- Query:
{app="event-exporter"}or{type="Warning"} - Filtering: By namespace, reason, type, kind, name
- Pod metrics: Resource usage per pod
- Cluster metrics: Overall cluster health
- CloudWatch integration: Grafana can query CloudWatch
- Centralized logging: All pod logs in one place
- Query language: LogQL for powerful log queries
- Retention: Configurable log retention policies
- Integration: Grafana dashboards for log visualization
KEDA (Pod Autoscaling):
- CPU-based: Scale on CPU utilization
- Memory-based: Scale on memory usage
- Custom metrics: Scale on any Prometheus metric
Karpenter (Node Autoscaling):
- Intelligent provisioning: Right-sized nodes
- Bin-packing: Efficient resource utilization
- Fast scaling: Nodes ready in ~2 minutes
- Cost optimization: Spot instances support
- EKS Control Plane: ~$73/month
- EC2: 2 x t3.medium (~$60/month)
- NAT Gateway: ~$32/month
- EBS Volumes: ~$10/month
- Total: ~$175/month
- Use Karpenter with Spot - Save up to 90% on compute
- Scale down when not in use
- Use smaller node sizes for dev/test
- Destroy infrastructure when not needed
# Destroy when not in use
gh workflow run terraform-destroy.yml -f confirm=destroy
# Redeploy when needed
git commit --allow-empty -m "Redeploy" && git push- ✅ No credentials in code or version control
- ✅ Federated authentication (OIDC)
- ✅ Encrypted Terraform state
- ✅ IAM roles with least privilege
- ✅ IRSA for pod-level permissions
- ✅ Secrets stored in GitHub Secrets
- ✅ Workflow concurrency control
Security Enhancements:
- 🔲 External Secrets Operator - Sync secrets from AWS Secrets Manager
- 🔲 Private Cluster Endpoint - Restrict API server access
- 🔲 Network Policies - Control pod-to-pod traffic
- 🔲 Pod Security Standards - Enforce security policies
- 🔲 AWS Config - Compliance and governance
- 🔲 KMS Encryption - Encrypt Kubernetes secrets at rest
Infrastructure Improvements:
- 🔲 Separate Node Groups - System vs user workloads
- 🔲 Production Instance Types - t3.large or larger
- 🔲 Resource Limits - CPU/memory limits on all pods
- 🔲 Velero Backups - Disaster recovery
- 🔲 Multi-region - High availability
Operational:
- 🔲 Cost Alerts - AWS Budgets and alerts
- 🔲 Terraform Workspaces - Dev/staging/prod environments
- 🔲 Runbooks - Incident response procedures
- 🔲 SLO/SLA Monitoring - Service level objectives
- ✅ S3 backend creation
- ✅ Backend configuration auto-update
- ✅ IAM role creation and configuration
- ✅ OIDC provider setup
- ✅ GitHub secrets (3 of 5 automated)
- ✅ EKS cluster deployment
- ✅ ArgoCD installation and configuration
- ✅ Application deployment via GitOps
- ✅ Karpenter configuration with cluster info
- ✅ Grafana CloudWatch integration
- ✅ KEDA autoscaling setup
- ✅ Monitoring stack deployment
- ❌ Add
GIT_USERNAMEsecret (one-time) - ❌ Add
ARGOCD_GITHUB_TOKENsecret (one-time)
This lab includes HashiCorp Vault with CSI driver integration - the production-standard pattern for secrets management in Kubernetes.
Why Vault + CSI?
- ✅ Secrets never stored in Kubernetes (bypasses etcd completely)
- ✅ No sidecar containers (CSI driver is shared across all pods)
- ✅ Automatic secret rotation without pod restarts
- ✅ Full audit trail of secret access
- ✅ Works with any programming language (just read files)
Pod starts
↓
Kubernetes mounts CSI volume
↓
CSI Driver authenticates with Vault (using ServiceAccount token)
↓
Vault validates and returns secrets
↓
Secrets appear as files in /mnt/secrets/
↓
App reads secrets like normal files
1. Check Vault is running:
kubectl get pods -n vault
# vault-0 1/1 Running
# vault-csi-provider-xxxxx 2/2 Running2. See demo app using Vault:
kubectl get pods -n demo
kubectl logs -n demo -l app=demo-app3. Check example production app:
kubectl get pods -n production
kubectl logs -n production -l app=myappStep 1: Create secret in Vault
kubectl exec -n vault vault-0 -- vault kv put secret/myapp/prod \
api_key=your-secret-key \
db_password=your-db-passwordStep 2: Create policy
kubectl exec -n vault vault-0 -- sh -c 'vault policy write myapp-prod - <<EOF
path "secret/data/myapp/prod" {
capabilities = ["read"]
}
EOF'Step 3: Create Kubernetes role
kubectl exec -n vault vault-0 -- vault write auth/kubernetes/role/myapp-prod \
bound_service_account_names=myapp \
bound_service_account_namespaces=production \
policies=myapp-prod \
ttl=24hStep 4: Use in your app
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: myapp-secrets
spec:
provider: vault
parameters:
vaultAddress: "http://vault.vault:8200"
roleName: "myapp-prod"
objects: |
- objectName: "api_key"
secretPath: "secret/data/myapp/prod"
secretKey: "api_key"
---
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
serviceAccountName: myapp
containers:
- name: app
volumeMounts:
- name: secrets
mountPath: /mnt/secrets
readOnly: true
env:
- name: API_KEY
value: "$(cat /mnt/secrets/api_key)"
volumes:
- name: secrets
csi:
driver: secrets-store.csi.k8s.io
volumeAttributes:
secretProviderClass: "myapp-secrets"See apps/myapp/ for a complete working example with:
- Automated Vault configuration (Job)
- SecretProviderClass definition
- Deployment using CSI-mounted secrets
- ArgoCD integration with sync waves
To deploy your own app:
- Copy
apps/myapp/folder - Update secret paths and values in
templates/vault-config.yaml - Update container image in
templates/app.yaml - Create ArgoCD app in
argocd-apps/ - Push to Git - ArgoCD deploys automatically!
| Feature | Kubernetes Secrets | Vault + CSI |
|---|---|---|
| Storage | etcd (base64) | Vault (encrypted) |
| Access Control | RBAC only | Policy-based + RBAC |
| Audit Trail | None | Full audit log |
| Rotation | Manual pod restart | Automatic |
| Overhead | None | Shared DaemonSet |
| Multi-cloud | No | Yes |
Current Setup (Dev Mode):
⚠️ In-memory storage (data lost on restart)⚠️ Single instance (no HA)⚠️ Root token "root" (insecure)⚠️ Auto-unsealed (convenient but insecure)
For Production:
- ✅ Persistent storage (EBS or S3)
- ✅ HA with 3+ replicas and Raft consensus
- ✅ Auto-unseal with AWS KMS
- ✅ Proper initialization with key sharding
- ✅ Audit logging to CloudWatch
- ✅ Backup and disaster recovery
- Amazon EKS Documentation
- ArgoCD Documentation
- Karpenter Documentation
- KEDA Documentation
- HashiCorp Vault Documentation
- Secrets Store CSI Driver
- Terraform AWS Provider
- GitOps Principles
MIT
This is a learning lab project. Feel free to fork and adapt for your needs!
- Purpose: Learning and portfolio demonstration
- Environment: Lab/Development
- Instance Type: t3.medium (cost-optimized)
- Security: Basic (OIDC, IRSA, encrypted state)
This setup provides a solid foundation but requires these enhancements:
Must Have:
- Private cluster endpoint
- Network policies
- Resource limits on all pods
- External Secrets Operator with AWS Secrets Manager
- Velero backups
- Production instance types (t3.large+)
- KMS encryption for Kubernetes secrets
Should Have:
- Separate node groups (system/user)
- Cost alerts and budgets
- Multi-environment setup (dev/staging/prod)
- Comprehensive monitoring and alerting
- Disaster recovery plan
Cost Considerations:
- Current setup: ~$175/month
- Production setup: ~$400-600/month (with redundancy)
- Remember to destroy resources when not in use