AKS AI Workloads

End-to-end examples for running AI/ML workloads on Azure Kubernetes Service (AKS) with GPU acceleration, powered by KubeRay and Karpenter.

Architecture

This repository demonstrates a hybrid multi-cloud architecture where an AKS control plane manages GPU nodes across both Azure and Nebius Cloud, connected via VPN. All examples support both cloud providers through Kustomize overlays.

Key infrastructure components:

Flex Karpenter for automatic node provisioning and autoscaling
Kubernetes DRA (Dynamic Resource Allocation) with NVIDIA DRA driver for topology-aware GPU scheduling
NVIDIA H100 80GB GPU instances on both Azure and Nebius
KubeRay operator for managing Ray clusters on Kubernetes

Examples

LLM

Example	Description
Distributed Inference	Benchmark LLM inference throughput and latency using Ray Data LLM with vLLM. Default model: Qwen2.5-7B-Instruct.
Fine-Tuning	LoRA SFT on Qwen2.5-7B-Instruct using Ray Train and LLaMA-Factory for entity recognition on the Viggo dataset.

Multimodal

Example	Description
Batch Inference	Generate CLIP image embeddings at scale using Ray Data with GPU actors, with cosine similarity search.
Distributed Training	Train an image classifier on CLIP embeddings using Ray Train with PyTorch DDP and MLflow tracking.

Infrastructure

Example	Description
Autoscaling	Karpenter node pool configurations for automatic CPU and GPU node provisioning on Azure and Nebius.

Each example follows a consistent layout:

main.py - Application entry point
run.sh - One-command launcher
base/ - Kustomize base manifests (RayJob + DRA ResourceClaimTemplate)
overlays/{azure,nebius}/ - Cloud-specific Kustomize patches

Prerequisites

An AKS cluster with GPU node pools (NVIDIA H100 recommended)
KubeRay operator v1.5.1+ installed
NVIDIA DRA driver deployed for GPU scheduling
Karpenter enabled (for autoscaling examples)
kubectl and kustomize CLI tools

Getting Started

Ensure your AKS cluster and prerequisites are configured.
Navigate to the example you want to run (e.g., examples/llm/distributed-inferencing/).
Review the example's README for specific configuration details.
Run the example:

# Example: LLM Inference on Azure
./examples/llm/distributed-inferencing/run.sh azure

Each example's run.sh script handles applying the Kustomize manifests and submitting the RayJob to the cluster.

Technologies

Category	Technologies
Cloud Platforms	Azure (AKS), Nebius Cloud
Distributed Computing	Ray 2.48–2.53, KubeRay, Ray Data, Ray Train
LLM Inference	vLLM, Ray Data LLM
LLM Fine-Tuning	LLaMA-Factory (LoRA SFT)
Models	Qwen2.5-7B-Instruct, OpenAI CLIP
ML Frameworks	PyTorch, HuggingFace Transformers
Experiment Tracking	MLflow
GPU Scheduling	Kubernetes DRA, NVIDIA DRA Driver
Autoscaling	Flex Karpenter
GPU Hardware	NVIDIA H100 80GB HBM3

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AKS AI Workloads

Architecture

Examples

LLM

Multimodal

Infrastructure

Prerequisites

Getting Started

Technologies

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AKS AI Workloads

Architecture

Examples

LLM

Multimodal

Infrastructure

Prerequisites

Getting Started

Technologies

About

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages