You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/content/en/docs/reference/metrics.md
+350Lines changed: 350 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,328 @@ description: >
8
8
---
9
9
<!-- this document is generated from hack/docs/metrics_gen/main.go -->
10
10
Karpenter makes several metrics available in Prometheus format to allow monitoring cluster provisioning status. These metrics are available by default at `karpenter.kube-system.svc.cluster.local:8080/metrics` configurable via the `METRICS_PORT` environment variable documented [here](../settings)
11
+
### `karpenter_ignored_pod_count`
12
+
Number of pods ignored during scheduling by Karpenter
13
+
- Stability Level: ALPHA
14
+
15
+
### `karpenter_build_info`
16
+
A metric with a constant '1' value labeled by version from which karpenter was built.
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
51
+
- Stability Level: BETA
52
+
53
+
### `operator_nodeclaim_status_condition_count`
54
+
The number of a condition for a nodeclaim, type and status. Labeled by the name, namespace, type, status, and reason.
The time taken between a node's deletion request and the removal of its finalizer
85
+
- Stability Level: BETA
86
+
87
+
### `karpenter_nodes_terminated_total`
88
+
Number of nodes terminated in total by Karpenter. Labeled by owning nodepool.
89
+
- Stability Level: STABLE
90
+
91
+
### `karpenter_nodes_system_overhead`
92
+
Node system daemon overhead are the resources reserved for system overhead, the difference between the node's capacity and allocatable values are reported by the status.
93
+
- Stability Level: BETA
94
+
95
+
### `karpenter_nodes_lifetime_duration_seconds`
96
+
The lifetime duration of the nodes since creation.
97
+
- Stability Level: ALPHA
98
+
99
+
### `karpenter_nodes_eviction_requests_total`
100
+
The total number of eviction requests made by Karpenter
101
+
- Stability Level: ALPHA
102
+
103
+
### `karpenter_nodes_drained_total`
104
+
The total number of nodes drained by Karpenter
105
+
- Stability Level: ALPHA
106
+
107
+
### `karpenter_nodes_current_lifetime_seconds`
108
+
Node age in seconds
109
+
- Stability Level: ALPHA
110
+
111
+
### `karpenter_nodes_created_total`
112
+
Number of nodes created in total by Karpenter. Labeled by owning nodepool.
113
+
- Stability Level: STABLE
114
+
115
+
### `karpenter_nodes_allocatable`
116
+
Node allocatable are the resources allocatable by nodes.
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
129
+
- Stability Level: BETA
130
+
131
+
### `operator_node_status_condition_count`
132
+
The number of a condition for a node, type and status. Labeled by the name, namespace, type, status, and reason.
The current amount of time in seconds that a node has been in terminating state. Labeled by name, and namespace.
137
+
- Stability Level: BETA
138
+
139
+
### `operator_node_termination_duration_seconds`
140
+
The amount of time taken by a node to terminate completely.
141
+
- Stability Level: BETA
142
+
143
+
### `operator_node_event_count`
144
+
The number of a events for a node.
145
+
- Stability Level: BETA
146
+
147
+
## Pods Metrics
148
+
149
+
### `karpenter_pods_state`
150
+
Pod state is the current state of pods. This metric can be used several ways as it is labeled by the pod name, namespace, owner, node, nodepool name, zone, architecture, capacity type, instance type and pod phase.
151
+
- Stability Level: BETA
152
+
153
+
### `karpenter_pods_startup_duration_seconds`
154
+
The time from pod creation until the pod is running.
155
+
- Stability Level: STABLE
156
+
157
+
## Termination Metrics
158
+
159
+
### `operator_termination_duration_seconds`
160
+
The amount of time taken by an object to terminate completely.
161
+
- Stability Level: DEPRECATED
162
+
163
+
### `operator_termination_current_time_seconds`
164
+
The current amount of time in seconds that an object has been in terminating state.
Duration of scheduling simulations used for deprovisioning and provisioning in seconds.
193
+
- Stability Level: STABLE
194
+
195
+
### `karpenter_scheduler_queue_depth`
196
+
The number of pods currently waiting to be scheduled.
197
+
- Stability Level: BETA
198
+
199
+
## Nodepools Metrics
200
+
201
+
### `karpenter_nodepools_usage`
202
+
The amount of resources that have been provisioned for a nodepool. Labeled by nodepool name and resource type.
203
+
- Stability Level: ALPHA
204
+
205
+
### `karpenter_nodepools_limit`
206
+
Limits specified on the nodepool that restrict the quantity of resources provisioned. Labeled by nodepool name and resource type.
207
+
- Stability Level: ALPHA
208
+
209
+
### `karpenter_nodepools_allowed_disruptions`
210
+
The number of nodes for a given NodePool that can be concurrently disrupting at a point in time. Labeled by NodePool. Note that allowed disruptions can change very rapidly, as new nodes may be created and others may be deleted at any point.
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
223
+
- Stability Level: BETA
224
+
225
+
### `operator_nodepool_status_condition_count`
226
+
The number of an condition for a nodepool, type and status. Labeled by the name, namespace, type, status, and reason.
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
Returns 1 if cluster state is synced and 0 otherwise. Synced checks that nodeclaims and nodes that are stored in the APIServer have the same representation as Karpenter's cluster state
Instance type offering estimated hourly price used when making informed decisions on node cost calculation, based on instance type, capacity type, and zone.
0 commit comments