Kubernetes (K8s) orchestrates containers at scale. In this pipeline, it runs the training pods, the MLflow server, the Kubeflow Pipelines control plane, and the NVIDIA device plugin. Understanding Kubernetes is essential for debugging failed training runs, scaling workloads, and managing GPU resources.
This module covers what you need to know to work with Kubernetes in an ML context. It does not cover everything Kubernetes can do -- that would take a book. It covers what you need for this pipeline.
| File | What You Will Learn | Time |
|---|---|---|
| what-is-kubernetes.md | Core concepts, architecture, kubectl basics | 1 hour |
| k3s-quickstart.md | Lightweight K8s for development, installation | 45 min |
| yaml-manifests-explained.md | Reading and writing K8s YAML, repo walkthrough | 1 hour |
| gpu-scheduling.md | NVIDIA device plugin, GPU resource management | 45 min |
| exercises.md | Hands-on practice with kubectl and manifests | 1.5 hours |
- Completed Module 1 (Docker) or equivalent Docker knowledge
- A Linux machine or VM for K3s installation (or use the EC2 instance from this pipeline)
- Familiarity with YAML syntax
This repo's EC2 instance runs K3s (lightweight Kubernetes). On top of K3s:
- NVIDIA device plugin (
k8s/nvidia-device-plugin.yaml) -- makes GPUs visible to Kubernetes - MLflow server (
mlflow.yaml) -- experiment tracking with persistent storage - Kubeflow Pipelines -- orchestrates training workflows (Module 6)
- Training pods -- run the YOLOv5 Docker image with GPU access
The remote_setup.sh script installs K3s, deploys these components, and
verifies everything works. Understanding this module means understanding what
that script sets up and why.
- You can explain what a Pod, Deployment, Service, and DaemonSet are
- You can use kubectl to get, describe, and view logs for pods
- You can read a Kubernetes YAML manifest and understand its structure
- You understand how the NVIDIA device plugin makes GPUs available
- You can apply a YAML manifest with kubectl
- You understand the difference between K3s and full Kubernetes