Skip to content

Commit c197489

Browse files
authored
DOC-11499 Docs for obs: Support SQL metrics collected by Application and DB (#19619)
* In multi-dimensional-metrics.md, added section: Enable database and application_name labels. * Incorporated Kevin’s feedback. Added call out that this feature is in Preview. * Incorporated Kathryn’s feedback.
1 parent 618ce1e commit c197489

File tree

3 files changed

+276
-5
lines changed

3 files changed

+276
-5
lines changed

src/current/_data/v25.2/metrics/multi-dimensional-metrics.yml

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,3 +252,93 @@
252252

253253
- multi_dimensional_metric_id: sql.exec.latency.detail
254254
feature: detailed-latency
255+
256+
- multi_dimensional_metric_id: sql.crud_query.count
257+
feature: database-and-application_name-labels
258+
259+
- multi_dimensional_metric_id: sql.crud_query.count.internal
260+
feature: database-and-application_name-labels
261+
262+
- multi_dimensional_metric_id: sql.delete.count
263+
feature: database-and-application_name-labels
264+
265+
- multi_dimensional_metric_id: sql.delete.count.internal
266+
feature: database-and-application_name-labels
267+
268+
- multi_dimensional_metric_id: sql.distsql.contended_queries.count
269+
feature: database-and-application_name-labels
270+
271+
- multi_dimensional_metric_id: sql.distsql.cumulative_contention_nanos
272+
feature: database-and-application_name-labels
273+
274+
- multi_dimensional_metric_id: sql.failure.count
275+
feature: database-and-application_name-labels
276+
277+
- multi_dimensional_metric_id: sql.failure.count.internal
278+
feature: database-and-application_name-labels
279+
280+
- multi_dimensional_metric_id: sql.full.scan.count
281+
feature: database-and-application_name-labels
282+
283+
- multi_dimensional_metric_id: sql.full.scan.count.internal
284+
feature: database-and-application_name-labels
285+
286+
- multi_dimensional_metric_id: sql.insert.count
287+
feature: database-and-application_name-labels
288+
289+
- multi_dimensional_metric_id: sql.insert.count.internal
290+
feature: database-and-application_name-labels
291+
292+
- multi_dimensional_metric_id: sql.select.count
293+
feature: database-and-application_name-labels
294+
295+
- multi_dimensional_metric_id: sql.select.count.internal
296+
feature: database-and-application_name-labels
297+
298+
- multi_dimensional_metric_id: sql.service.latency
299+
feature: database-and-application_name-labels
300+
301+
- multi_dimensional_metric_id: sql.service.latency.internal
302+
feature: database-and-application_name-labels
303+
304+
- multi_dimensional_metric_id: sql.statements.active
305+
feature: database-and-application_name-labels
306+
307+
- multi_dimensional_metric_id: sql.statements.active.internal
308+
feature: database-and-application_name-labels
309+
310+
- multi_dimensional_metric_id: sql.txn.begin.count
311+
feature: database-and-application_name-labels
312+
313+
- multi_dimensional_metric_id: sql.txn.begin.count.internal
314+
feature: database-and-application_name-labels
315+
316+
- multi_dimensional_metric_id: sql.txn.commit.count
317+
feature: database-and-application_name-labels
318+
319+
- multi_dimensional_metric_id: sql.txn.commit.count.internal
320+
feature: database-and-application_name-labels
321+
322+
- multi_dimensional_metric_id: sql.txn.latency
323+
feature: database-and-application_name-labels
324+
325+
- multi_dimensional_metric_id: sql.txn.latency.internal
326+
feature: database-and-application_name-labels
327+
328+
- multi_dimensional_metric_id: sql.txn.rollback.count
329+
feature: database-and-application_name-labels
330+
331+
- multi_dimensional_metric_id: sql.txn.rollback.count.internal
332+
feature: database-and-application_name-labels
333+
334+
- multi_dimensional_metric_id: sql.txns.open
335+
feature: database-and-application_name-labels
336+
337+
- multi_dimensional_metric_id: sql.txns.open.internal
338+
feature: database-and-application_name-labels
339+
340+
- multi_dimensional_metric_id: sql.update.count
341+
feature: database-and-application_name-labels
342+
343+
- multi_dimensional_metric_id: sql.update.count.internal
344+
feature: database-and-application_name-labels

src/current/v25.2/cockroachdb-feature-availability.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,16 @@ Any feature made available in a phase prior to GA is provided without any warran
4747
**The following features are in preview** and are subject to change. To share feedback and/or issues, contact [Support](https://support.cockroachlabs.com/hc).
4848
{{site.data.alerts.end}}
4949

50+
51+
### `database` and `application_name` labels for certain metrics
52+
53+
The following cluster settings enable the [`database` and `application_name` labels for certain metrics]({% link {{ page.version.version }}/multi-dimensional-metrics.md %}#enable-database-and-application_name-labels), along with their internal counterparts if they exist:
54+
55+
- `sql.metrics.database_name.enabled`
56+
- `sql.metrics.application_name.enabled`
57+
58+
By default, these cluster settings are disabled.
59+
5060
### Vector indexes
5161

5262
A [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) enables efficient approximate nearest neighbor (ANN) search on high-dimensional [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) columns. Use vector indexes to improve the performance of similarity searches over large datasets, such as embeddings generated by machine learning models.

src/current/v25.2/multi-dimensional-metrics.md

Lines changed: 176 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Multi-dimensional Metrics
3-
summary: Learn about high-cardinality multi-dimensional metrics enabled by the child metrics and detailed latency cluster settings
3+
summary: Learn about high-cardinality multi-dimensional metrics enabled by the applicable cluster settings.
44
toc: true
55
docs_area: reference.metrics
66
---
@@ -10,13 +10,15 @@ Multi-dimensional metrics are additional [Prometheus]({% link {{ page.version.ve
1010
The export of multi-dimensional metrics can be enabled by two [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}):
1111

1212
- [`server.child_metrics.enabled`](#enable-child-metrics)
13+
- [`sql.metrics.database_name.enabled`](#enable-database-and-application_name-labels)
14+
- [`sql.metrics.application_name.enabled`](#enable-database-and-application_name-labels)
1315
- [`sql.stats.detailed_latency_metrics.enabled`](#enable-detailed-latency-metrics)
1416

1517
## Enable child metrics
1618

1719
Child metrics are specific, detailed metrics that are usually related to a higher-level (parent or aggregate) metric. They often provide more granular or specific information about a particular aspect of the parent metric. The parent metrics and their potential child metrics are determined by the specific feature the cluster is using.
1820

19-
The [cluster setting `server.child_metrics.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-server-child-metrics-enabled) is disabled by default. To enable it, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement:
21+
The [cluster setting `server.child_metrics.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-server-child-metrics-enabled) is disabled by default. To enable it, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement.
2022

2123
{% include_cached copy-clipboard.html %}
2224
~~~ sql
@@ -433,12 +435,181 @@ changefeed_flush_hist_nanos_sum{node_id="1",scope="office_dogs"} 9.79696709e+08
433435
changefeed_flush_hist_nanos_count{node_id="1",scope="office_dogs"} 2
434436
~~~
435437

438+
## Enable `database` and `application_name` labels
439+
440+
{{site.data.alerts.callout_info}}
441+
{% include feature-phases/preview.md %}
442+
{{site.data.alerts.end}}
443+
444+
The following cluster settings enable the `database` and `application_name` labels for certain metrics, along with their internal counterparts if they exist:
445+
446+
- [`sql.metrics.database_name.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-metrics-database-name-enabled)
447+
- [`sql.metrics.application_name.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-metrics-application-name-enabled)
448+
449+
By default, these cluster settings are disabled. To enable them, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement. Because these labels use aggregate metrics, you must enable the [`server.child_metrics.enabled`](#enable-child-metrics) cluster setting to use them.
450+
451+
{% include_cached copy-clipboard.html %}
452+
~~~ sql
453+
SET CLUSTER SETTING server.child_metrics.enabled = true;
454+
~~~
455+
456+
{% include_cached copy-clipboard.html %}
457+
~~~ sql
458+
SET CLUSTER SETTING sql.metrics.database_name.enabled = true;
459+
~~~
460+
461+
{% include_cached copy-clipboard.html %}
462+
~~~ sql
463+
SET CLUSTER SETTING sql.metrics.application_name.enabled = true;
464+
~~~
465+
466+
Toggling the `sql.metrics.database_name.enabled` and `sql.metrics.application_name.enabled` cluster settings clears existing metric values for current label combinations and reinitializes the affected metrics to reflect the new label configuration.
467+
468+
When toggling the `sql.metrics.database_name.enabled` and `sql.metrics.application_name.enabled` cluster settings, only the values for existing metric label combinations will be cleared. Aggregated metric values for the affected metrics will not be cleared.
469+
470+
{{site.data.alerts.callout_info}}
471+
Child metrics (metrics with the `database` and `application_name` labels) are independent from the parent (aggregated metric). The child metrics are initialized when the cluster settings are enabled.
472+
473+
For this reason, child `COUNTER` metrics may not always add up to the parent `COUNTER` metric. For an example, refer to [Examples 1 through 6](#1-all-cluster-settings-disabled).
474+
475+
For `GAUGE` metrics, values may be different and potentially unexpected depending on when a setting is enabled. For an example, refer to [7. `GAUGE` metric example](#7-gauge-metric-example).
476+
{{site.data.alerts.end}}
477+
478+
These labels affect only the metrics emitted via [Prometheus export]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). They are not visible in the [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards).
479+
480+
The system retains up to 5,000 recently used label combinations.
481+
482+
{% assign feature = "database-and-application_name-labels" %}
483+
{% include {{ page.version.version }}/multi-dimensional-metrics-table.md %}
484+
485+
### Examples
486+
487+
This section demonstrates the impact of enabling and disabling the relevant cluster settings.
488+
489+
[Examples 1 through 6](#1-all-cluster-settings-disabled) show the effect on the `COUNTER` metric `sql.select.count`. During these examples, the [`movr` workload]({% link {{ page.version.version }}/cockroach-workload.md %}#run-the-movr-workload) was running on node 1. The aggregated metric consistently increases as the examples progress.
490+
491+
[Example 7](#7-gauge-metric-example) shows a possible effect on the `GAUGE` metric `sql.txn.open`.
492+
493+
#### 1. All cluster settings disabled
494+
495+
{% include_cached copy-clipboard.html %}
496+
~~~ sql
497+
SET CLUSTER SETTING server.child_metrics.enabled = false;
498+
SET CLUSTER SETTING sql.metrics.database_name.enabled = false;
499+
SET CLUSTER SETTING sql.metrics.application_name.enabled = false;
500+
~~~
501+
502+
The Prometheus export only gives the aggregated metric for the node.
503+
504+
~~~
505+
# HELP sql_select_count Number of SQL SELECT statements successfully executed
506+
# TYPE sql_select_count counter
507+
sql_select_count{node_id="1"} 2030
508+
~~~
509+
510+
#### 2. Only child metrics enabled
511+
512+
{% include_cached copy-clipboard.html %}
513+
~~~ sql
514+
SET CLUSTER SETTING server.child_metrics.enabled = true;
515+
SET CLUSTER SETTING sql.metrics.database_name.enabled = false;
516+
SET CLUSTER SETTING sql.metrics.application_name.enabled = false;
517+
~~~
518+
519+
The Prometheus export still only gives the aggregated metric for the node.
520+
521+
~~~
522+
sql_select_count{node_id="1"} 6568
523+
~~~
524+
525+
#### 3. Child metrics and `database_name` label enabled
526+
527+
{% include_cached copy-clipboard.html %}
528+
~~~ sql
529+
SET CLUSTER SETTING server.child_metrics.enabled = true;
530+
SET CLUSTER SETTING sql.metrics.database_name.enabled = true;
531+
SET CLUSTER SETTING sql.metrics.application_name.enabled = false;
532+
~~~
533+
534+
The aggregated metric and a child metric with only the `database` label are emitted.
535+
536+
~~~
537+
sql_select_count{node_id="1"} 10259
538+
sql_select_count{node_id="1",database="movr"} 816
539+
~~~
540+
541+
#### 4. Child metrics and `application_name` label enabled
542+
543+
{% include_cached copy-clipboard.html %}
544+
~~~ sql
545+
SET CLUSTER SETTING server.child_metrics.enabled = true;
546+
SET CLUSTER SETTING sql.metrics.database_name.enabled = false;
547+
SET CLUSTER SETTING sql.metrics.application_name.enabled = true;
548+
~~~
549+
550+
The aggregated metric and a child metric with only the `application_name` label are emitted. Note that even though the aggregated metric has increased, the child metric with `application_name` label has a value less than the child metric with `database` label in the [preceding example](#3-child-metrics-and-database_name-label-enabled). This is because the labeled metrics have been reset, while the aggregated metric was not reset.
551+
552+
~~~
553+
sql_select_count{node_id="1"} 14077
554+
sql_select_count{node_id="1",application_name="movr"} 718
555+
~~~
556+
557+
#### 5. Child metrics and both `database` and `application_name` label enabled
558+
559+
{% include_cached copy-clipboard.html %}
560+
~~~ sql
561+
SET CLUSTER SETTING server.child_metrics.enabled = true;
562+
SET CLUSTER SETTING sql.metrics.database_name.enabled = true;
563+
SET CLUSTER SETTING sql.metrics.application_name.enabled = true;
564+
~~~
565+
566+
The aggregated metric and a child metric with both `database` and `application_name` labels are emitted.
567+
568+
~~~
569+
sql_select_count{node_id="1"} 21085
570+
sql_select_count{node_id="1",database="movr",application_name="movr"} 3962
571+
~~~
572+
573+
#### 6. Aggregate metric disabled
574+
575+
The [cluster setting `server.child_metrics.include_aggregate.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-server-child-metrics-include-aggregate-enabled) (default: `true`) reports an aggregate time series for applicable multi-dimensional metrics. When set to `false`, it stops reporting the aggregate time series, preventing double counting when querying those metrics.
576+
577+
{% include_cached copy-clipboard.html %}
578+
~~~ sql
579+
SET CLUSTER SETTING server.child_metrics.enabled = true;
580+
SET CLUSTER SETTING sql.metrics.database_name.enabled = true;
581+
SET CLUSTER SETTING sql.metrics.application_name.enabled = true;
582+
SET CLUSTER SETTING server.child_metrics.include_aggregate.enabled = false;
583+
~~~
584+
585+
No aggregated metric emitted.
586+
Only the child metric with both the `database` and `application_name` labels is emitted.
587+
588+
~~~
589+
sql_select_count{node_id="1",database="movr",application_name="movr"} 8703
590+
~~~
591+
592+
#### 7. `GAUGE` metric example
593+
594+
Changes to cluster settings may take time to reinitialize affected metrics. As a result, some `GAUGE` metrics might briefly show unexpected values.
595+
596+
`GAUGE` values for both aggregated and child metrics increase and decrease as transactions are opened and closed. This example uses the `GAUGE` metric `sql.txn.open`.
597+
598+
Consider the following scenario:
599+
600+
Time | Action | `sql.txn.open` aggregated metric | `sql.txn.open` child metric
601+
:---:|--------|:--------------------------------:|:---------------------------:
602+
1 | Open a transaction. The value of the `sql.txn.open` aggregated metric is incremented. | 1 | -
603+
2 | Enable `sql.metrics.database_name.enabled` and `sql.metrics.application_name.enabled`.<br>Re-initialize child metrics. | 1 | 0
604+
3 | Close a transaction.<br>The values of both the aggregated and child `sql.txn.open` metrics are decremented. | 0 | -1
605+
606+
To avoid negative values in child metrics, use the [Prometheus `clamp_min` function](https://prometheus.io/docs/prometheus/latest/querying/functions/#clamp_min) to set the metric to zero.
607+
436608
## Enable detailed latency metrics
437609

438-
The [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.detailed_latency_metrics.enabled`
439-
labels the latency metric `sql.exec.latency.detail` with the [statement fingerprint]({% link {{ page.version.version }}/ui-statements-page.md %}#sql-statement-fingerprints). To estimate the cardinality of the set of all statement fingerprints, use the `sql.query.unique.count` metric. For most workloads, this metric ranges from dozens to hundreds.
610+
The [cluster setting `sql.stats.detailed_latency_metrics.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-detailed-latency-metrics-enabled) labels the latency metric `sql.exec.latency.detail` with the [statement fingerprint]({% link {{ page.version.version }}/ui-statements-page.md %}#sql-statement-fingerprints). To estimate the cardinality of the set of all statement fingerprints, use the `sql.query.unique.count` metric. For most workloads, this metric ranges from dozens to hundreds.
440611

441-
`sql.stats.detailed_latency_metrics.enabled` is disabled by default. To enable it, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement:
612+
`sql.stats.detailed_latency_metrics.enabled` is disabled by default. To enable it, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement.
442613

443614
{% include_cached copy-clipboard.html %}
444615
~~~ sql

0 commit comments

Comments
 (0)