ml-ai-lifecycle

Composable, reproducible environments for ML training on GPU clusters. Each layer is available as both a Flox environment and a Nix flake, so you can use whichever your infrastructure already has.

Layers

The repo is organized as a stack of independent environments that compose upward. Each can be used standalone or pulled in as a dependency.

┌─────────────────────────────────────────────┐
│  model-training       Composed ML training   │
│                        environment           │
├──────────┬──────────────────┬───────────────┤
│ build-env│cuda-dev-essentials│pytorch-runtime│
│ gcc,cmake│ CUDA 12.9 toolkit │ Python 3.13 + │
│ openssl  │ nvcc, cudnn, nccl │ PyTorch       │
└──────────┴──────────────────┴───────────────┘

`build-env`

Cross-platform build toolchain: gcc/clang, cmake, openssl, pkg-config, coreutils. Works on Linux (x86_64, aarch64) and macOS (x86_64, aarch64).

`cuda-dev-essentials`

CUDA 12.9 development tools: nvcc, cudart, cuBLAS, cuDNN, NCCL, cuTensor, CUPTI, cuda-gdb, sanitizer API. Linux only.

`pytorch-runtime`

Python 3.13 with PyTorch. CUDA-accelerated on Linux, MPS-accelerated on aarch64-darwin. The Nix flake exposes layered package outputs: runtime, training (adds TensorBoard, W&B), eval, and dev (adds Jupyter). model-training uses the runtime output and installs additional training deps via pip.

`model-training`

Composed environment that pulls in build-env, cuda-dev-essentials, and pytorch-runtime. Adds uv for fast package management and sets up a venv with ML training dependencies (datasets, transformers, accelerate, etc.). This is the environment you'd point a training job at.

nanogpt-slurm (separate repo)

End-to-end tutorial: train a GPT language model on a Slurm cluster using the model-training environment. Includes Slurm job scripts for data prep, training, sampling, and evaluation.

Using with Flox

Each environment is published to FloxHub under the flox-labs namespace. You can use them individually or composed:

# Use a single layer
flox activate -r flox-labs/pytorch-runtime

# Use the composed training environment
flox activate -r flox-labs/model-training

The model-training environment includes the other three via Flox's [include] mechanism — you don't need to activate them separately.

Using with Nix

Each directory contains a flake.nix. Use them directly from GitHub:

# Use a single layer
nix develop github:flox/ml-ai-lifecycle?dir=pytorch-runtime

# Use the composed training environment
nix develop github:flox/ml-ai-lifecycle?dir=model-training

The model-training flake pulls in the other three as inputs with nixpkgs.follows to ensure a single nixpkgs evaluation.

What Each Layer Provides

Layer	Flox	Nix	Key packages
`build-env`	`flox-labs/build-env`	`?dir=build-env`	gcc/clang, cmake, openssl, coreutils
`cuda-dev-essentials`	`flox-labs/cuda-dev-essentials`	`?dir=cuda-dev-essentials`	CUDA 12.9 (nvcc, cudnn, nccl, cublas)
`pytorch-runtime`	`flox-labs/pytorch-runtime`	`?dir=pytorch-runtime`	Python 3.13, PyTorch (CUDA on Linux)
`model-training`	`flox-labs/model-training`	`?dir=model-training`	All of the above + uv, datasets, transformers
`nanogpt-slurm`	—	—	Slurm job scripts (uses `model-training`)

Platform Support

Layer	x86_64-linux	aarch64-linux	x86_64-darwin	aarch64-darwin
`build-env`	Yes	Yes	Yes	Yes
`cuda-dev-essentials`	Yes	Yes	—	—
`pytorch-runtime`	Yes (CUDA)	Yes (CUDA)	Nix only (CPU)	Yes (MPS)
`model-training`	Yes	Yes	Yes	Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml-ai-lifecycle

Layers

`build-env`

`cuda-dev-essentials`

`pytorch-runtime`

`model-training`

nanogpt-slurm (separate repo)

Using with Flox

Using with Nix

What Each Layer Provides

Platform Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
build-env		build-env
cuda-dev-essentials		cuda-dev-essentials
model-training		model-training
pytorch-runtime		pytorch-runtime
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ml-ai-lifecycle

Layers

build-env

cuda-dev-essentials

pytorch-runtime

model-training

nanogpt-slurm (separate repo)

Using with Flox

Using with Nix

What Each Layer Provides

Platform Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`build-env`

`cuda-dev-essentials`

`pytorch-runtime`

`model-training`

Packages