Skip to content

Latest commit

 

History

History

README.md

< Back to Learning Path


Module 2: Kubernetes Essentials

Kubernetes (K8s) orchestrates containers at scale. In this pipeline, it runs the training pods, the MLflow server, the Kubeflow Pipelines control plane, and the NVIDIA device plugin. Understanding Kubernetes is essential for debugging failed training runs, scaling workloads, and managing GPU resources.

This module covers what you need to know to work with Kubernetes in an ML context. It does not cover everything Kubernetes can do -- that would take a book. It covers what you need for this pipeline.

Topics

File What You Will Learn Time
what-is-kubernetes.md Core concepts, architecture, kubectl basics 1 hour
k3s-quickstart.md Lightweight K8s for development, installation 45 min
yaml-manifests-explained.md Reading and writing K8s YAML, repo walkthrough 1 hour
gpu-scheduling.md NVIDIA device plugin, GPU resource management 45 min
exercises.md Hands-on practice with kubectl and manifests 1.5 hours

Prerequisites

  • Completed Module 1 (Docker) or equivalent Docker knowledge
  • A Linux machine or VM for K3s installation (or use the EC2 instance from this pipeline)
  • Familiarity with YAML syntax

How This Connects to the Pipeline

This repo's EC2 instance runs K3s (lightweight Kubernetes). On top of K3s:

  1. NVIDIA device plugin (k8s/nvidia-device-plugin.yaml) -- makes GPUs visible to Kubernetes
  2. MLflow server (mlflow.yaml) -- experiment tracking with persistent storage
  3. Kubeflow Pipelines -- orchestrates training workflows (Module 6)
  4. Training pods -- run the YOLOv5 Docker image with GPU access

The remote_setup.sh script installs K3s, deploys these components, and verifies everything works. Understanding this module means understanding what that script sets up and why.

Checklist Before Moving to Module 3

  • You can explain what a Pod, Deployment, Service, and DaemonSet are
  • You can use kubectl to get, describe, and view logs for pods
  • You can read a Kubernetes YAML manifest and understand its structure
  • You understand how the NVIDIA device plugin makes GPUs available
  • You can apply a YAML manifest with kubectl
  • You understand the difference between K3s and full Kubernetes

< Back to Learning Path