Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,9 @@ spec:
modelHub:
modelID: facebook/opt-125m
inferenceFlavors:
- name: t4 # GPU type
requests:
nvidia.com/gpu: 1
- name: t4 # GPU type
requests:
nvidia.com/gpu: 1
```

#### Inference Playground
Expand Down Expand Up @@ -124,12 +124,11 @@ If you want to learn more about this project, please refer to [develop.md](./doc
- CLI tool support
- Model training, fine tuning in the long-term


## Community

Join us for more discussions:

* **Slack Channel**: [#llmaz](https://inftyai.slack.com/archives/C06D0BGEQ1G)
- **Slack Channel**: [#llmaz](https://inftyai.slack.com/archives/C06D0BGEQ1G)

## Contributions

Expand Down
16 changes: 16 additions & 0 deletions api/inference/v1alpha1/backendruntime_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,22 @@ type BackendRuntimeSpec struct {
// accelerators like GPU should not be defined here, but at the model flavors,
// or the values here will be overwritten.
Resources ResourceRequirements `json:"resources"`
// Periodic probe of backend liveness.
// Backend will be restarted if the probe fails.
// Cannot be updated.
// +optional
LivenessProbe *corev1.Probe `json:"livenessProbe,omitempty"`
// Periodic probe of backend readiness.
// Backend will be removed from service endpoints if the probe fails.
// +optional
ReadinessProbe *corev1.Probe `json:"readinessProbe,omitempty"`
// StartupProbe indicates that the Backend has successfully initialized.
// If specified, no other probes are executed until this completes successfully.
// If this probe fails, the backend will be restarted, just as if the livenessProbe failed.
// This can be used to provide different probe parameters at the beginning of a backend's lifecycle,
// when it might take a long time to load data or warm a cache, than during steady-state operation.
// +optional
StartupProbe *corev1.Probe `json:"startupProbe,omitempty"`
}

// BackendRuntimeStatus defines the observed state of BackendRuntime
Expand Down
21 changes: 21 additions & 0 deletions chart/templates/backends/llamacpp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ spec:
- "0.0.0.0"
- --port
- "8080"
# TODO: not supported yet, see https://github.com/InftyAI/llmaz/issues/240.
- name: speculative-decoding
flags:
- -m
Expand All @@ -40,4 +41,24 @@ spec:
limits:
cpu: 2
memory: 4Gi
startupProbe:
periodSeconds: 10
failureThreshold: 30
httpGet:
path: /health
port: 8080
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
httpGet:
path: /health
port: 8080
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
httpGet:
path: /health
port: 8080
{{- end }}
20 changes: 20 additions & 0 deletions chart/templates/backends/sglang.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,24 @@ spec:
limits:
cpu: 4
memory: 8Gi
startupProbe:
periodSeconds: 10
failureThreshold: 30
httpGet:
path: /health
port: 8080
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
httpGet:
path: /health
port: 8080
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
httpGet:
path: /health_generate
port: 8080
{{- end }}
20 changes: 20 additions & 0 deletions chart/templates/backends/tgi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,24 @@ spec:
limits:
cpu: 4
memory: 8Gi
startupProbe:
periodSeconds: 10
failureThreshold: 30
httpGet:
path: /health
port: 8080
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
httpGet:
path: /health
port: 8080
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
httpGet:
path: /health
port: 8080
{{- end }}
20 changes: 20 additions & 0 deletions chart/templates/backends/vllm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,24 @@ spec:
limits:
cpu: 4
memory: 8Gi
startupProbe:
periodSeconds: 10
failureThreshold: 30
httpGet:
path: /health
port: 8080
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
httpGet:
path: /health
port: 8080
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
httpGet:
path: /health
port: 8080
{{- end }}
Loading
Loading