Hatchery Core

Hatchery Core is an open-source, Tinker-compatible runtime for fine-tuning and post-training language models on infrastructure you control. It provides a local or self-hosted gateway, GPU worker, LoRA trainer, checkpoint store, and Python client surface for supervised fine-tuning, RL-style objectives, custom losses, and sampling.

Use Hatchery when you want to:

Run Tinker-style training recipes locally or inside your own deployment.
Fine-tune Hugging Face models with LoRA without adopting a hosted control plane.
Experiment with model families and continued-fine-tuning workflows beyond the hosted base-model catalog.
Keep sensitive datasets and domain-specific training workflows inside infrastructure you control.
Prototype SFT, GRPO, PPO-style, and custom-loss workflows against a real API.
Keep the training runtime extensible: storage, queues, auth, and workers are pluggable.

Hatchery is not a managed training service by itself. The open-core package provides the runtime and extension points; production fleet orchestration, hosted auth, billing, and managed infrastructure can be layered around it.

What You Get

Tinker-compatible API: use the official tinker.ServiceClient or Hatchery's built-in client.
Local-first development: start a gateway and worker with one command.
Real GPU training: train LoRA adapters against Hugging Face causal language models.
Post-training primitives: SFT, GRPO, PPO, CISPO, DAPO, GSPO, custom forward/backward losses, sampling, and checkpoint save/load.
Base-model friendly recipes: train pretrained base checkpoints while borrowing tokenizer/chat templates from matching instruction models when useful.
Pluggable backends: in-memory and local backends ship in core; external packages can provide shared queues, object stores, auth, and deployment integrations.

Requirements

Python 3.12+
Linux or macOS for local development
uv recommended, pip also works
CUDA-capable GPU for real model training
PyTorch-compatible GPU drivers when using CUDA
Hugging Face access for any model or dataset you load

The scripted worker can run without a GPU and is useful for API smoke tests. Real fine-tuning requires a GPU and a model that fits your device.

Install From Source

git clone https://github.com/axolotl-ai-cloud/hatchery-core.git
cd hatchery-core

uv venv
source .venv/bin/activate

# Runtime plus test/dev tools.
uv pip install -e '.[test,dev]'

# Add cookbook dataset dependencies when running examples.
uv pip install -e '.[examples]'

For GPU machines, PyTorch wheel selection still comes from your package manager or index configuration. The gpu extra is intentionally a compatibility hook:

uv pip install -e '.[gpu,test,examples]'

Optional DFlash Support

Hatchery can use DFlash for speculative decoding on supported model/draft-adapter pairs. DFlash is not declared as a hatchery-core package extra because public PyPI rejects packages whose published metadata contains direct Git dependencies. Install it explicitly in worker images or local environments that enable DFlash:

uv pip install -e '.[gpu,test,examples]'
uv pip install 'dflash[transformers] @ git+https://github.com/z-lab/dflash.git@4febcb4b32824a39fc683c9b74d193f885d9fe19'

Without DFlash installed, non-strict speculative-decoding requests fall back to the normal Hugging Face generation path. Strict requests raise so deployment smoke tests can catch a missing dependency.

Quick Start

Start the local dev server:

python -m hatchery.core.local_dev

By default this starts a gateway at http://127.0.0.1:8420 with bearer token dev. If a CUDA device is available, the launcher uses a GPU worker; otherwise it falls back to the scripted worker.

Use a specific pretrained base model:

HATCHERY_DEV_DEVICE=cuda:0 \
HATCHERY_DEV_BASE_MODEL=Qwen/Qwen2-0.5B \
python -m hatchery.core.local_dev

Check the server:

curl http://127.0.0.1:8420/v1/health

Expected response:

{"status":"ok"}

First Training Run

In another terminal, run the Pig Latin SFT/GRPO cookbook:

python -m hatchery.core.examples.train_sft \
  --base-url http://127.0.0.1:8420 \
  --token dev \
  --base-model Qwen/Qwen2-0.5B \
  --steps 100 \
  --rl-steps 0 \
  --batch-size 2

This trains the pretrained Qwen/Qwen2-0.5B base model. By default the example borrows the chat template from Qwen/Qwen2-0.5B-Instruct for formatting, but the trained model remains the base checkpoint.

For a simple style-transfer smoke test:

python -m hatchery.core.examples.train_pirate_style \
  --base-url http://127.0.0.1:8420 \
  --token dev \
  --base-model Qwen/Qwen2-0.5B \
  --steps 100 \
  --batch-size 2

The examples print loss, save a checkpoint, and sample from the trained adapter.

Python Client

Hatchery follows the Tinker convention: clients pre-shift causal language-model training data. If tokens is the full token sequence, send tokens[:-1] as the model input and tokens[1:] as target_tokens.

from hatchery.core.client import HatcheryClient

client = HatcheryClient(base_url="http://127.0.0.1:8420", token="dev")
training = client.create_lora_training_client("Qwen/Qwen2-0.5B", rank=8)

tokens = [10, 20, 30, 40, 50]
datum = {
    "model_input": {
        "chunks": [{"type": "encoded_text", "tokens": tokens[:-1]}],
    },
    "loss_fn_inputs": {
        "target_tokens": {
            "dtype": "int64",
            "data": tokens[1:],
            "shape": [len(tokens) - 1],
        },
        "weights": {
            "dtype": "float32",
            "data": [1.0] * (len(tokens) - 1),
            "shape": [len(tokens) - 1],
        },
    },
}

fb = training.forward_backward([datum]).result(timeout=120)
training.optim_step(learning_rate=1e-4).result(timeout=60)

print(fb)
print(training.sample(tokens[:-1], max_tokens=32).result(timeout=30))

client.close()

For prompt/completion training, set weights to 0.0 on prompt positions and 1.0 on completion positions after the shift. The examples in hatchery/core/examples show this pattern with chat templates and prompt masking.

You can also use the official Tinker SDK:

import tinker

client = tinker.ServiceClient(api_key="dev", base_url="http://127.0.0.1:8420")
training = client.create_lora_training_client(base_model="Qwen/Qwen2-0.5B")

Architecture

Client (HatcheryClient or tinker SDK)
  |
  v
Gateway (FastAPI, /api/v1 and /v1)
  |
  v
Job queue (in-memory, SQLite, or extension backend)
  |
  v
GPU worker (model, LoRA adapter, optimizer)
  |
  v
Object store (local, in-memory, or extension backend)

The gateway is stateless. Workers own model loading, adapter state, training operations, sampling, and checkpoint materialization. Multiple workers can share queue and storage backends supplied by extensions.

Configuration

Common local-dev variables:

Variable	Default	Description
`HATCHERY_DEV_PORT`	`8420`	HTTP port
`HATCHERY_DEV_API_KEY`	`dev`	Bearer token
`HATCHERY_DEV_BASE_MODEL`	`Qwen/Qwen2-0.5B`	Hugging Face model id
`HATCHERY_DEV_DEVICE`	auto-detected	Torch device, for example `cuda:0`
`HATCHERY_DEV_NO_GPU`	`0`	Set `1` to force the scripted worker

Core runtime variables:

Variable	Default	Description
`HATCHERY_ADMIN_API_KEY`	generated	Admin API key for the core gateway
`HATCHERY_OBJECT_STORE`	`local`	`local` or `memory`
`HATCHERY_LOCAL_STORE_PATH`	`/tmp/hatchery_data`	Local object-store root
`HATCHERY_BASE_MODEL`	`Qwen/Qwen2-0.5B`	Standalone worker model
`HATCHERY_WORKER_DEVICE`	`cuda:0`	Standalone worker device
`HATCHERY_CONFIG_FACTORY`	unset	`module:callable` config factory

Documentation

Quickstart: local gateway, worker, and client workflow.
Self-hosting: running core outside the local dev launcher.
Architecture: control plane, worker, queue, and storage flow.
API endpoints: native and Tinker-compatible routes.
Testing: CPU, GPU, lint, and packaging checks.
Extending Hatchery Core: backend and extension contracts.

For enterprise or self-hosted deployment questions, or if you need help adapting Hatchery to a sensitive workflow, contact contact@axolotl.ai.

Development

ruff check hatchery/ tests/
ruff format --check hatchery/ tests/
python -m pytest tests/ -q

GPU tests live under tests/torch_tests and tests/torch_distributed. They require CUDA plus compatible model/cache access:

python -m pytest tests/torch_tests/ tests/torch_distributed/ -q

Project Status

Hatchery Core is alpha software. The public API is intended to be useful for experimentation and extension work, but interfaces may change while the project moves toward a stable release.

Known Limitations

The default local launcher is designed for development and smoke testing, not production scheduling.
Multi-worker deployments require shared queue, metadata, and object-store backends supplied through configuration or extension packages.
GPU compatibility depends on the installed PyTorch build, CUDA drivers, model architecture, and available VRAM.
The included cookbooks are validation and experimentation launchpads, not fully tuned training recipes for every model family.
Multi-modal model paths exist in core, but release validation is still focused primarily on text-only workflows.
Public APIs, config contracts, and packaging metadata may still change before a stable release.

Roadmap

Incorporate optimized Axolotl training patches and kernels where they improve throughput, memory use, or model-family coverage.
Expand multi-modal validation for VLM training, sampling, and checkpoint flows.
Harden distributed and multi-worker self-hosting patterns with shared queue and object-store backends.
Add more end-to-end cookbooks for common SFT, RL, style-transfer, and custom-loss workflows.
Improve observability around loss, token throughput, queue latency, GPU memory, and checkpoint operations.
Stabilize the public API and configuration surface for a non-alpha release.

Contributing

See CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md, and RELEASE.md.

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
content/docs		content/docs
hatchery		hatchery
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
VERSION		VERSION
docs.json		docs.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hatchery Core

What You Get

Requirements

Install From Source

Optional DFlash Support

Quick Start

First Training Run

Python Client

Architecture

Configuration

Documentation

Development

Project Status

Known Limitations

Roadmap

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Hatchery Core

What You Get

Requirements

Install From Source

Optional DFlash Support

Quick Start

First Training Run

Python Client

Architecture

Configuration

Documentation

Development

Project Status

Known Limitations

Roadmap

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages