Skip to content

feat(distributed): declarative per-model scheduling via env/args#10308

Merged
mudler merged 8 commits into
masterfrom
feat/distributed-scheduling-env-config
Jun 13, 2026
Merged

feat(distributed): declarative per-model scheduling via env/args#10308
mudler merged 8 commits into
masterfrom
feat/distributed-scheduling-env-config

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Summary

In distributed mode, per-model scheduling (node-label selector + min/max replicas) could previously only be set through the WebUI or API. This adds a declarative startup surface so it can be configured programmatically for unattended/IaC installs, plus a new "spread to all matching nodes" mode.

  • Two new env/CLI flags (mirroring PRELOAD_MODELS / PRELOAD_MODELS_CONFIG):
    • LOCALAI_MODEL_SCHEDULING - inline JSON list of scheduling entries
    • LOCALAI_MODEL_SCHEDULING_CONFIG - path to a YAML file with the same list
  • Authoritative/declarative: config is re-applied on every boot, overwriting listed models; unlisted models are left untouched. Seeded in initDistributed before the reconciler starts. Set without LOCALAI_DISTRIBUTED -> warn and ignore.
  • New spread_all mode (replicas: all alias in the config): runs one replica on every node matching the selector (all healthy backend nodes when the selector is empty), tracked as nodes join/leave. Implemented as a dynamic Min==Max==matching-node-count target in the reconciler, reusing the existing scale-up/scale-down machinery. Mutually exclusive with min_replicas/max_replicas.
  • API: spread_all exposed on the scheduling endpoint with mutual-exclusion validation.
  • React UI: third "Spread to all" mode in the scheduling form, plus a spread indicator in the scheduling list.
  • Docs: distributed-mode page documents the env vars, the authoritative full-row semantics, and the one-replica-per-node assumption (default MAX_REPLICAS_PER_MODEL=1).

Entry shape (YAML):

- model_name: gpt-oss
  node_selector:
    tier: gpu
  replicas: all
- model_name: whisper
  node_selector:
    tier: cpu
  min_replicas: 1
  max_replicas: 0   # unbounded up to cluster capacity

Test Plan

  • go test ./core/services/nodes/ passes (parser, registry round-trip/seeding, reconciler spread behavior, against testcontainers PostgreSQL)
  • go test ./core/http/endpoints/localai/ spread_all validation specs pass
  • go build + go vet clean on all touched Go packages
  • React UI npm run build succeeds
  • Manual: start distributed frontend with LOCALAI_MODEL_SCHEDULING_CONFIG and confirm a labelled model spreads across matching workers

mudler added 7 commits June 13, 2026 15:27
…seeding

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
… seeding

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
}

if configPath != "" {
data, err := os.ReadFile(configPath)
…duling config

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit 7637f8c into master Jun 13, 2026
58 of 59 checks passed
@mudler mudler deleted the feat/distributed-scheduling-env-config branch June 13, 2026 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants