EKS GitOps Lab

Production-ready Amazon EKS infrastructure with GitOps using ArgoCD, fully automated via GitHub Actions and Terraform.

🚀 From Scratch to Production

This project demonstrates a complete GitOps workflow from zero to a fully automated Kubernetes cluster:

Bootstrap → Create S3 backend for Terraform state
Setup → Configure IAM role with OIDC authentication
Deploy → Push to GitHub, infrastructure deploys automatically
GitOps → ArgoCD syncs applications from Git every 30 seconds
Scale → Karpenter autoscales nodes, KEDA autoscales pods
Monitor → Prometheus + Grafana for metrics, Loki for logs
Cleanup → One command destroys everything

Total setup time: ~20 minutes (mostly waiting for EKS cluster)

Manual steps: Only 3 (bootstrap, OIDC, GitHub App)

Everything else: Fully automated via GitHub Actions and ArgoCD

🎯 What Gets Deployed

Infrastructure

EKS Cluster: Kubernetes 1.34 with managed node groups (2 t3.medium nodes)
Networking: VPC with public/private subnets across 2 AZs
Storage: EBS-backed persistent volumes
Autoscaling: Karpenter for intelligent node scaling

GitOps & Automation

ArgoCD: Automated application deployment with app-of-apps pattern
GitHub Actions: OIDC-based CI/CD pipeline
Terraform: Infrastructure as Code with S3 remote state

Applications & Services

nginx: Web server with KEDA autoscaling
KEDA: Event-driven pod autoscaling (CPU/Memory triggers)
Karpenter: Intelligent node autoscaling and bin-packing
Prometheus Stack: Metrics collection with persistent storage (15 days retention)
Grafana: Metrics visualization with CloudWatch integration and persistent dashboards
Loki: Log aggregation backend
Promtail: Log collection from all pods
Event Exporter: Kubernetes events to Loki for Grafana visualization

Secrets Management

HashiCorp Vault: Centralized secrets management with audit logging
Secrets Store CSI Driver: Kubernetes-native secret injection (no sidecars!)
Vault CSI Provider: Direct integration between Vault and Kubernetes pods
Demo Apps: Working examples showing Vault integration patterns

AWS Controllers for Kubernetes (ACK)

ACK EKS Controller: Manages EKS resources via Kubernetes CRDs
Access Entries: Automatically created from SSO roles
GitOps-native: Self-healing access management

🔐 Security Features

Authentication & Authorization

✅ AWS OIDC: No stored credentials
✅ Federated Authentication: GitHub Actions authenticates via OIDC
✅ IAM Identity Center: SSO with multiple users and permission sets
✅ ACK EKS Controller: Automatic AccessEntry creation from SSO roles
✅ RBAC: Role-based access control with namespace isolation
✅ IAM Roles: Least privilege access for all services
✅ IRSA: IAM Roles for Service Accounts (Karpenter, Grafana)
✅ Encrypted State: S3 backend with encryption at rest

Data Protection

✅ No Secrets in Code: All sensitive data in GitHub Secrets
✅ Branch Protection: PRs required via workflow concurrency
✅ State Locking: Native S3 locking prevents concurrent modifications

Security Scanning

✅ Checkov: IaC security scanning in CI/CD pipeline
✅ Terraform Validation: Format and validation checks
ℹ️ Note: Checkov chosen for deep Terraform analysis

📋 Prerequisites

AWS CLI configured (aws configure)
GitHub CLI (gh auth login)
Terraform (v1.13.5+)
kubectl
Git

🚀 Quick Start (3 Steps)

1. Bootstrap Backend

./scripts/bootstrap-backend.sh

What it does:

Creates S3 bucket for Terraform state (with versioning & encryption)
Uses native S3 locking (no DynamoDB needed)
Automatically updates terraform/backend.tf with bucket name

Output:

✅ Backend created successfully!
✅ Updated terraform/backend.tf automatically!

2. Setup OIDC Access

./scripts/setup-oidc-access.sh

What it does:

Creates GitHub OIDC provider in AWS (if not exists)
Creates IAM role for GitHub Actions
Configures federated credentials
Automatically adds 3 GitHub secrets

Output:

✅ OIDC setup complete!
✅ GitHub secrets added!

3. Create GitHub App (One-time Setup)

If you don't have a GitHub App yet, create one:

Go to: https://github.com/settings/apps/new

Required Settings:

Name: ArgoCD-EKS-GitOps (or any name)
Homepage: https://github.com/YOUR_USERNAME/eks-gitops-lab
Webhook: ✅ Uncheck "Active" (we don't need webhooks)
Repository permissions:
- Contents: Read-only (ArgoCD needs to read your repo)
- Metadata: Read-only (automatically required)
Where can this app be installed: Only on this account

After creation:

Generate private key → Downloads .pem file
Note App ID → Shown on the app page
Install app → Click "Install App" → Select eks-gitops-lab repository
Note Installation ID → From URL: github.com/settings/installations/XXXXXXXX

Store GitHub App secrets:

cd ~/Downloads
gh secret set ARGOCD_APP_PRIVATE_KEY < argocd-eks-gitops.*.private-key.pem
gh secret set ARGOCD_APP_ID -b "YOUR_APP_ID"
gh secret set ARGOCD_APP_INSTALLATION_ID -b "YOUR_INSTALLATION_ID"

✅ GitHub App configured! This is reusable for future deployments.

4. Deploy

git add .
git commit -m "Initial deployment"
git push origin main

That's it! GitHub Actions will:

Run terraform plan (security scan)
Deploy EKS cluster (~15 minutes)
Install ArgoCD
Update app configs with cluster info
Deploy all applications automatically

🏗️ Architecture

Infrastructure Flow

┌─────────────────────────────────────────────────────────────┐
│                         AWS Cloud                           │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐ │
│  │                    VPC (10.0.0.0/16)                  │ │
│  │                                                       │ │
│  │  ┌──────────────────┐      ┌──────────────────┐     │ │
│  │  │  Public Subnet   │      │  Public Subnet   │     │ │
│  │  │  10.0.1.0/24     │      │  10.0.2.0/24     │     │ │
│  │  │  (AZ-1)          │      │  (AZ-2)          │     │ │
│  │  │  - NAT Gateway   │      │                  │     │ │
│  │  └──────────────────┘      └──────────────────┘     │ │
│  │           │                         │                │ │
│  │  ┌──────────────────┐      ┌──────────────────┐     │ │
│  │  │ Private Subnet   │      │ Private Subnet   │     │ │
│  │  │ 10.0.37.0/24     │      │ 10.0.60.0/24     │     │ │
│  │  │ (AZ-1)           │      │ (AZ-2)           │     │ │
│  │  │ ┌──────────────┐ │      │ ┌──────────────┐ │     │ │
│  │  │ │ EKS Nodes    │ │      │ │ EKS Nodes    │ │     │ │
│  │  │ │ t3.medium    │ │      │ │ t3.medium    │ │     │ │
│  │  │ └──────────────┘ │      │ └──────────────┘ │     │ │
│  │  └──────────────────┘      └──────────────────┘     │ │
│  └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

GitOps Flow

Developer → PR → Plan → Review → Merge → Apply → Update Configs → ArgoCD Syncs

Application Deployment

┌──────────────────────────────────────────────────────────────┐
│                        ArgoCD                                │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  core-apps (App of Apps)                                    │
│  ├─ Monitors: argocd-apps/ directory                       │
│  ├─ Auto-sync: Every 30 seconds                            │
│  └─ Auto-prune: Removes deleted apps                       │
│                                                              │
│  Applications                                                │
│  ├─ nginx (with KEDA autoscaling)                          │
│  ├─ keda (pod autoscaling controller)                      │
│  ├─ karpenter (node autoscaling)                           │
│  ├─ kube-prometheus-stack (monitoring)                     │
│  ├─ loki (log aggregation)                                 │
│  └─ promtail (log collection)                              │
│                                                              │
└──────────────────────────────────────────────────────────────┘

📁 Project Structure

.
├── .github/workflows/
│   ├── terraform.yml           # Main CI/CD pipeline
│   ├── terraform-destroy.yml   # Infrastructure cleanup
│   └── update-app-values.yml   # Update configs from Terraform
├── apps/                       # Helm charts for applications
│   ├── nginx/
│   ├── keda/
│   ├── karpenter/
│   ├── kube-prometheus-stack/
│   ├── loki/
│   ├── promtail/
│   ├── event-exporter/        # Kubernetes events to Loki
│   ├── secrets-store-csi/     # CSI driver for secrets
│   ├── vault/                 # HashiCorp Vault
│   ├── vault-demo/            # Vault integration demo
│   ├── myapp/                 # Example app with Vault
│   ├── ack-eks-controller/    # ACK EKS controller
│   ├── access-entries/        # EKS access entries via ACK
│   └── rbac-setup/            # RBAC roles and bindings
├── argocd-apps/               # ArgoCD application definitions
│   ├── nginx.yaml
│   ├── keda.yaml
│   ├── karpenter.yaml
│   ├── kube-prometheus-stack.yaml
│   ├── loki.yaml
│   ├── promtail.yaml
│   ├── event-exporter.yaml
│   ├── ack-eks-controller.yaml
│   ├── access-entries.yaml
│   └── rbac-setup.yaml
├── terraform/                 # Terraform infrastructure
│   ├── modules/
│   │   ├── aks/              # EKS cluster configuration
│   │   ├── argocd/           # ArgoCD Helm deployment
│   │   └── vpc/              # Virtual network
│   ├── backend.tf            # Terraform backend configuration
│   ├── main.tf               # Main Terraform configuration
│   ├── variables.tf          # Variable definitions
│   ├── outputs.tf            # Output definitions
│   └── provider.tf           # Provider configuration
├── scripts/                   # Automation scripts
│   ├── bootstrap-backend.sh
│   ├── setup-oidc-access.sh
│   └── cleanup-all.sh
└── README.md

🎮 Accessing Services

EKS Cluster

# Get credentials
aws eks update-kubeconfig --name eks-gitops-lab --region eu-central-1

# Check cluster
kubectl get nodes
kubectl get pods --all-namespaces

ArgoCD UI

# Port forward
kubectl port-forward svc/argocd-server -n argocd 8080:443

# Get password
kubectl get secret argocd-initial-admin-secret -n argocd -o jsonpath="{.data.password}" | base64 -d

# Open browser
open https://localhost:8080
# Username: admin
# Password: (from above command)

Grafana

# Port forward
kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80

# Get password
kubectl get secret kube-prometheus-stack-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d

# Open browser
open http://localhost:3000
# Username: admin
# Password: (from above command)

Prometheus

kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090
open http://localhost:9090

AWS SSO Access

# Configure SSO profile
aws configure sso
# SSO start URL: https://d-99675f4fc7.awsapps.com/start
# SSO Region: eu-central-1
# Account: 432801802107
# Role: EKSDeveloper / EKSDevOps / EKSReadOnly

# Login
aws sso login --profile <profile-name>

# Access EKS
aws eks update-kubeconfig --name eks-gitops-lab --region eu-central-1 --profile <profile-name>
kubectl get pods -n dev  # Developer access
kubectl get nodes        # DevOps access

User Roles:

EKSDeveloper: Full access to dev namespace only
EKSDevOps: Full cluster access (all namespaces, nodes)
EKSReadOnly: Read-only access to all namespaces

🧹 Cleanup

Complete Cleanup

./scripts/cleanup-all.sh

This removes:

✅ IAM role
✅ S3 bucket and all objects
✅ GitHub secrets
✅ Local Terraform state files

Partial Cleanup (Keep Backend)

# Destroy infrastructure only (manual trigger required)
gh workflow run terraform-destroy.yml -f confirm=destroy

🐛 Troubleshooting

Issue: Workflow fails with permission error

Solution: The IAM role needs proper permissions. Check:

aws iam get-role --role-name GitHubActionsEKSRole

Issue: ArgoCD not syncing apps

Possible causes:

GitHub token expired
Repository URL incorrect
Branch name mismatch

Solution:

# Check ArgoCD repo secret
kubectl get secret argocd-repo -n argocd -o yaml

# Update if needed
kubectl delete secret argocd-repo -n argocd
# Re-run update-app-values workflow
gh workflow run update-app-values.yml

Issue: Karpenter not scaling nodes

Solution: Check if Karpenter has correct cluster info:

# Manually trigger update workflow
gh workflow run update-app-values.yml

# Verify Karpenter config
kubectl get ec2nodeclass -o yaml

Issue: Pods pending due to insufficient resources

Solution: Karpenter will automatically provision nodes. Check:

# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter

# Check pending pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending

📊 Monitoring & Observability

Metrics (Prometheus + Grafana)

Node metrics: CPU, memory, disk, network
Pod metrics: Resource usage per pod
Cluster metrics: Overall cluster health
CloudWatch integration: Grafana can query CloudWatch

Logs (Loki + Promtail)

Centralized logging: All pod logs in one place
Query language: LogQL for powerful log queries
Retention: Configurable log retention policies
Integration: Grafana dashboards for log visualization

Kubernetes Events (Event Exporter)

Event collection: All K8s events sent to Loki
Grafana visualization: View events in Grafana Explore
Query: {app="event-exporter"} or {type="Warning"}
Filtering: By namespace, reason, type, kind, name
Pod metrics: Resource usage per pod
Cluster metrics: Overall cluster health
CloudWatch integration: Grafana can query CloudWatch

Logs (Loki + Promtail)

Centralized logging: All pod logs in one place
Query language: LogQL for powerful log queries
Retention: Configurable log retention policies
Integration: Grafana dashboards for log visualization

Autoscaling

KEDA (Pod Autoscaling):

CPU-based: Scale on CPU utilization
Memory-based: Scale on memory usage
Custom metrics: Scale on any Prometheus metric

Karpenter (Node Autoscaling):

Intelligent provisioning: Right-sized nodes
Bin-packing: Efficient resource utilization
Fast scaling: Nodes ready in ~2 minutes
Cost optimization: Spot instances support

💰 Cost Optimization

Current Setup (2 nodes)

EKS Control Plane: ~$73/month
EC2: 2 x t3.medium (~$60/month)
NAT Gateway: ~$32/month
EBS Volumes: ~$10/month
Total: ~$175/month

Cost Saving Tips

Use Karpenter with Spot - Save up to 90% on compute
Scale down when not in use
Use smaller node sizes for dev/test
Destroy infrastructure when not needed

# Destroy when not in use
gh workflow run terraform-destroy.yml -f confirm=destroy

# Redeploy when needed
git commit --allow-empty -m "Redeploy" && git push

🔒 Security Best Practices

Implemented

✅ No credentials in code or version control
✅ Federated authentication (OIDC)
✅ Encrypted Terraform state
✅ IAM roles with least privilege
✅ IRSA for pod-level permissions
✅ Secrets stored in GitHub Secrets
✅ Workflow concurrency control

Recommended for Production

Security Enhancements:

🔲 External Secrets Operator - Sync secrets from AWS Secrets Manager
🔲 Private Cluster Endpoint - Restrict API server access
🔲 Network Policies - Control pod-to-pod traffic
🔲 Pod Security Standards - Enforce security policies
🔲 AWS Config - Compliance and governance
🔲 KMS Encryption - Encrypt Kubernetes secrets at rest

Infrastructure Improvements:

🔲 Separate Node Groups - System vs user workloads
🔲 Production Instance Types - t3.large or larger
🔲 Resource Limits - CPU/memory limits on all pods
🔲 Velero Backups - Disaster recovery
🔲 Multi-region - High availability

Operational:

🔲 Cost Alerts - AWS Budgets and alerts
🔲 Terraform Workspaces - Dev/staging/prod environments
🔲 Runbooks - Incident response procedures
🔲 SLO/SLA Monitoring - Service level objectives

📚 What's Automated

✅ S3 backend creation
✅ Backend configuration auto-update
✅ IAM role creation and configuration
✅ OIDC provider setup
✅ GitHub secrets (3 of 5 automated)
✅ EKS cluster deployment
✅ ArgoCD installation and configuration
✅ Application deployment via GitOps
✅ Karpenter configuration with cluster info
✅ Grafana CloudWatch integration
✅ KEDA autoscaling setup
✅ Monitoring stack deployment

✋ What's Manual

❌ Add GIT_USERNAME secret (one-time)
❌ Add ARGOCD_GITHUB_TOKEN secret (one-time)

🔐 Using Vault for Secrets Management

Overview

This lab includes HashiCorp Vault with CSI driver integration - the production-standard pattern for secrets management in Kubernetes.

Why Vault + CSI?

✅ Secrets never stored in Kubernetes (bypasses etcd completely)
✅ No sidecar containers (CSI driver is shared across all pods)
✅ Automatic secret rotation without pod restarts
✅ Full audit trail of secret access
✅ Works with any programming language (just read files)

Architecture

Pod starts
    ↓
Kubernetes mounts CSI volume
    ↓
CSI Driver authenticates with Vault (using ServiceAccount token)
    ↓
Vault validates and returns secrets
    ↓
Secrets appear as files in /mnt/secrets/
    ↓
App reads secrets like normal files

Quick Start

1. Check Vault is running:

kubectl get pods -n vault
# vault-0                                 1/1     Running
# vault-csi-provider-xxxxx                2/2     Running

2. See demo app using Vault:

kubectl get pods -n demo
kubectl logs -n demo -l app=demo-app

3. Check example production app:

kubectl get pods -n production
kubectl logs -n production -l app=myapp

Adding Secrets to Your App

Step 1: Create secret in Vault

kubectl exec -n vault vault-0 -- vault kv put secret/myapp/prod \
  api_key=your-secret-key \
  db_password=your-db-password

Step 2: Create policy

kubectl exec -n vault vault-0 -- sh -c 'vault policy write myapp-prod - <<EOF
path "secret/data/myapp/prod" {
  capabilities = ["read"]
}
EOF'

Step 3: Create Kubernetes role

kubectl exec -n vault vault-0 -- vault write auth/kubernetes/role/myapp-prod \
  bound_service_account_names=myapp \
  bound_service_account_namespaces=production \
  policies=myapp-prod \
  ttl=24h

Step 4: Use in your app

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: myapp-secrets
spec:
  provider: vault
  parameters:
    vaultAddress: "http://vault.vault:8200"
    roleName: "myapp-prod"
    objects: |
      - objectName: "api_key"
        secretPath: "secret/data/myapp/prod"
        secretKey: "api_key"
---
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      serviceAccountName: myapp
      containers:
      - name: app
        volumeMounts:
        - name: secrets
          mountPath: /mnt/secrets
          readOnly: true
        env:
        - name: API_KEY
          value: "$(cat /mnt/secrets/api_key)"
      volumes:
      - name: secrets
        csi:
          driver: secrets-store.csi.k8s.io
          volumeAttributes:
            secretProviderClass: "myapp-secrets"

Complete Example

See apps/myapp/ for a complete working example with:

Automated Vault configuration (Job)
SecretProviderClass definition
Deployment using CSI-mounted secrets
ArgoCD integration with sync waves

To deploy your own app:

Copy apps/myapp/ folder
Update secret paths and values in templates/vault-config.yaml
Update container image in templates/app.yaml
Create ArgoCD app in argocd-apps/
Push to Git - ArgoCD deploys automatically!

Key Benefits

Feature	Kubernetes Secrets	Vault + CSI
Storage	etcd (base64)	Vault (encrypted)
Access Control	RBAC only	Policy-based + RBAC
Audit Trail	None	Full audit log
Rotation	Manual pod restart	Automatic
Overhead	None	Shared DaemonSet
Multi-cloud	No	Yes

Production Considerations

Current Setup (Dev Mode):

⚠️ In-memory storage (data lost on restart)
⚠️ Single instance (no HA)
⚠️ Root token "root" (insecure)
⚠️ Auto-unsealed (convenient but insecure)

For Production:

✅ Persistent storage (EBS or S3)
✅ HA with 3+ replicas and Raft consensus
✅ Auto-unseal with AWS KMS
✅ Proper initialization with key sharding
✅ Audit logging to CloudWatch
✅ Backup and disaster recovery

🎓 Learning Resources

📝 License

MIT

🤝 Contributing

This is a learning lab project. Feel free to fork and adapt for your needs!

⚠️ Important Notes

Current Setup

Purpose: Learning and portfolio demonstration
Environment: Lab/Development
Instance Type: t3.medium (cost-optimized)
Security: Basic (OIDC, IRSA, encrypted state)

For Production Use

This setup provides a solid foundation but requires these enhancements:

Must Have:

Private cluster endpoint
Network policies
Resource limits on all pods
External Secrets Operator with AWS Secrets Manager
Velero backups
Production instance types (t3.large+)
KMS encryption for Kubernetes secrets

Should Have:

Separate node groups (system/user)
Cost alerts and budgets
Multi-environment setup (dev/staging/prod)
Comprehensive monitoring and alerting
Disaster recovery plan

Cost Considerations:

Current setup: ~$175/month
Production setup: ~$400-600/month (with redundancy)
Remember to destroy resources when not in use

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
.github/workflows		.github/workflows
apps		apps
argocd-apps-backup		argocd-apps-backup
argocd-apps		argocd-apps
docs		docs
scripts		scripts
terraform		terraform
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

EKS GitOps Lab

🚀 From Scratch to Production

🎯 What Gets Deployed

Infrastructure

GitOps & Automation

Applications & Services

Secrets Management

AWS Controllers for Kubernetes (ACK)

🔐 Security Features

Authentication & Authorization

Data Protection

Security Scanning

📋 Prerequisites

🚀 Quick Start (3 Steps)

1. Bootstrap Backend

2. Setup OIDC Access

3. Create GitHub App (One-time Setup)

4. Deploy

🏗️ Architecture

Infrastructure Flow

GitOps Flow

Application Deployment

📁 Project Structure

🎮 Accessing Services

EKS Cluster

ArgoCD UI

Grafana

Prometheus

AWS SSO Access

🧹 Cleanup

Complete Cleanup

Partial Cleanup (Keep Backend)

🐛 Troubleshooting

Issue: Workflow fails with permission error

Issue: ArgoCD not syncing apps

Issue: Karpenter not scaling nodes

Issue: Pods pending due to insufficient resources

📊 Monitoring & Observability

Metrics (Prometheus + Grafana)

Logs (Loki + Promtail)

Kubernetes Events (Event Exporter)

Logs (Loki + Promtail)

Autoscaling

💰 Cost Optimization

Current Setup (2 nodes)

Cost Saving Tips

🔒 Security Best Practices

Implemented

Recommended for Production

📚 What's Automated

✋ What's Manual

🔐 Using Vault for Secrets Management

Overview

Architecture

Quick Start

Adding Secrets to Your App

Complete Example

Key Benefits

Production Considerations

🎓 Learning Resources

📝 License

🤝 Contributing

⚠️ Important Notes

Current Setup

For Production Use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Packages