Skip to content

Latest commit

 

History

History

README.md

< Back to Learning Path


Module 5: Kubeflow Pipelines

Overview

This module covers Kubeflow Pipelines (KFP), the workflow orchestration platform that automates the execution of ML training jobs in this project. You will learn what KFP is, how to author pipelines in Python, and how pipelines are compiled, submitted, and executed on a Kubernetes cluster.

Prerequisites

  • Completion of Modules 1-4 (Docker, Kubernetes, container builds, MLflow)
  • Familiarity with Python decorators and basic YAML
  • Understanding of Kubernetes pods and services (from Module 3)

Learning Objectives

After completing this module you will be able to:

  1. Explain what pipeline orchestration is and why it matters for ML workflows.
  2. Describe the architecture of Kubeflow Pipelines standalone.
  3. Author a pipeline using the KFP Python SDK with container components.
  4. Compile a pipeline from Python to YAML.
  5. Submit and monitor pipeline runs using both the UI and the Python client.
  6. Debug failed pipeline runs using pod logs and events.

Module Contents

File Topic
what-is-kubeflow.md Pipeline orchestration and KFP architecture
pipeline-authoring.md Writing pipelines with the KFP Python SDK
pipeline-execution.md Submitting, monitoring, and debugging runs
exercises.md Hands-on exercises

Key Repo Files Referenced

Estimated Time

3-4 hours for reading and exercises.

How This Fits Into the Project

Kubeflow Pipelines is the "conductor" of the ML workflow. While MLflow (Module 4) is the memory, KFP is the automation. It takes a pipeline definition, schedules it on Kubernetes, manages the lifecycle of each step, and provides a UI for monitoring and managing runs.

In this project, the pipeline has a single step: run YOLOv5 training in a Docker container. But the same pattern scales to multi-step pipelines with data preprocessing, training, evaluation, and model deployment stages.

The flow:

pipeline.py (Python definition)
  |
  | compiler.Compiler().compile()
  v
pipeline.yaml (portable YAML specification)
  |
  | submit_run.py uploads to KFP API
  v
Kubeflow Pipelines API Server
  |
  | Creates an Argo Workflow
  v
Argo Workflow Controller
  |
  | Schedules Kubernetes pods
  v
Training Pod (runs train_wrapper.py -> logs to MLflow)

< Back to Learning Path