OCPBUGS-54806: Add telemetry for user-defined networks #2596

danwinship · 2025-04-29T15:25:37Z

Exports the metrics for UserDefinedNetwork/ClusterUserDefinedNetwork counts via telemetry.
(Depends on openshift/ovn-kubernetes#2523 to generate the raw metrics and openshift/cluster-network-operator#2697 to re-report them for telemetry.)

I was originally going to do this with kube-state-metrics CRD monitoring, but we're going to need to backport these metrics along with other updates to the feature, so it seems better to define them in ovn-k..

openshift-ci-robot · 2025-04-29T15:25:44Z

@danwinship: This pull request references Jira Issue OCPBUGS-54806, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.19.0) matches configured target version for branch (4.19.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Exports the metrics for UserDefinedNetwork/ClusterUserDefinedNetwork counts via telemetry.
(Depends on openshift/ovn-kubernetes#2523 to generate the raw metrics and openshift/cluster-network-operator#2697 to re-report them for telemetry.)

I was originally going to do this with kube-state-metrics CRD monitoring, but we're going to need to backport these metrics along with other updates to the feature, so it seems better to define them in ovn-k..

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-04-29T15:26:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: danwinship
Once this PR has been reviewed and has the lgtm label, please assign danielmellado for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danwinship · 2025-04-30T13:31:59Z

How do I test this? I can see in the console that my recording rules are working right, but I can't figure out what to test to see if this change is causing the right data to be exported in the right place...

jan--f · 2025-05-05T11:57:00Z

How do I test this? I can see in the console that my recording rules are working right, but I can't figure out what to test to see if this change is causing the right data to be exported in the right place...

Once this PR merges and the telemeter server config is updated you should see metrics coming in for CI clusters. Is that the feedback you're looking for?

danwinship · 2025-05-05T13:29:28Z

No, I mean, is there some way I can test it before merging? Like, I can test the metrics PR by bringing up a test cluster and using curl on the metrics endpoint or looking in the console. Is there something I can poke at in the test cluster that will confirm that I didn't make a typo here and I really am exporting the right data to telemetry?

juzhao · 2025-05-15T07:57:42Z

Documentation/telemetry/telemeter_query

@@ -1 +1 @@
-{__name__=~"cluster:usage:.*|count:up0|count:up1|cluster_version|cluster_version_available_updates|cluster_version_capability|cluster_operator_up|cluster_operator_conditions|cluster_version_payload|cluster_installer|cluster_infrastructure_provider|cluster_feature_set|instance:etcd_object_counts:sum|ALERTS|code:apiserver_request_total:rate:sum|cluster:capacity_cpu_cores:sum|cluster:capacity_memory_bytes:sum|cluster:cpu_usage_cores:sum|cluster:memory_usage_bytes:sum|openshift:cpu_usage_cores:sum|openshift:memory_usage_bytes:sum|workload:cpu_usage_cores:sum|workload:memory_usage_bytes:sum|cluster:virt_platform_nodes:sum|cluster:node_instance_type_count:sum|cnv:vmi_status_running:count|cnv_abnormal|cluster:vmi_request_cpu_cores:sum|node_role_os_version_machine:cpu_capacity_cores:sum|node_role_os_version_machine:cpu_capacity_sockets:sum|subscription_sync_total|olm_resolution_duration_seconds|csv_succeeded|csv_abnormal|cluster:kube_persistentvolumeclaim_resource_requests_storage_bytes:provisioner:sum|cluster:kubelet_volume_stats_used_bytes:provisioner:sum|ceph_cluster_total_bytes|ceph_cluster_total_used_raw_bytes|ceph_health_status|odf_system_raw_capacity_total_bytes|odf_system_raw_capacity_used_bytes|odf_system_health_status|job:ceph_osd_metadata:count|job:kube_pv:count|job:odf_system_pvs:count|job:ceph_pools_iops:total|job:ceph_pools_iops_bytes:total|job:ceph_versions_running:count|job:noobaa_total_unhealthy_buckets:sum|job:noobaa_bucket_count:sum|job:noobaa_total_object_count:sum|odf_system_bucket_count|odf_system_objects_total|noobaa_accounts_num|noobaa_total_usage|console_url|cluster:console_auth_login_requests_total:sum|cluster:console_auth_login_successes_total:sum|cluster:console_auth_login_failures_total:sum|cluster:console_auth_logout_requests_total:sum|cluster:console_usage_users:max|cluster:console_plugins_info:max|cluster:console_customization_perspectives_info:max|cluster:ovnkube_controller_egress_routing_via_host:max|cluster:ovnkube_controller_admin_network_policies_db_objects:max|cluster:ovnkube_controller_baseline_admin_network_policies_db_objects:max|cluster:ovnkube_controller_admin_network_policies_rules:max|cluster:ovnkube_controller_baseline_admin_network_policies_rules:max|cluster:network_attachment_definition_instances:max|cluster:network_attachment_definition_enabled_instance_up:max|cluster:ingress_controller_aws_nlb_active:sum|cluster:route_metrics_controller_routes_per_shard:min|cluster:route_metrics_controller_routes_per_shard:max|cluster:route_metrics_controller_routes_per_shard:avg|cluster:route_metrics_controller_routes_per_shard:median|cluster:openshift_route_info:tls_termination:sum|insightsclient_request_send_total|cam_app_workload_migrations|cluster:apiserver_current_inflight_requests:sum:max_over_time:2m|cluster:alertmanager_integrations:max|cluster:telemetry_selected_series:count|openshift:prometheus_tsdb_head_series:sum|openshift:prometheus_tsdb_head_samples_appended_total:sum|monitoring:container_memory_working_set_bytes:sum|namespace_job:scrape_series_added:topk3_sum1h|namespace_job:scrape_samples_post_metric_relabeling:topk3|monitoring:haproxy_server_http_responses_total:sum|profile:cluster_monitoring_operator_collection_profile:max|vendor_model:node_accelerator_cards:sum|rhmi_status|status:upgrading:version:rhoam_state:max|state:rhoam_critical_alerts:max|state:rhoam_warning_alerts:max|rhoam_7d_slo_percentile:max|rhoam_7d_slo_remaining_error_budget:max|cluster_legacy_scheduler_policy|cluster_master_schedulable|che_workspace_status|che_workspace_started_total|che_workspace_failure_total|che_workspace_start_time_seconds_sum|che_workspace_start_time_seconds_count|cco_credentials_mode|cluster:kube_persistentvolume_plugin_type_counts:sum|acm_managed_cluster_info|acm_managed_cluster_worker_cores:max|acm_console_page_count:sum|cluster:vsphere_vcenter_info:sum|cluster:vsphere_esxi_version_total:sum|cluster:vsphere_node_hw_version_total:sum|openshift:build_by_strategy:sum|rhods_aggregate_availability|rhods_total_users|instance:etcd_disk_wal_fsync_duration_seconds:histogram_quantile|instance:etcd_mvcc_db_total_size_in_bytes:sum|instance:etcd_network_peer_round_trip_time_seconds:histogram_quantile|instance:etcd_mvcc_db_total_size_in_use_in_bytes:sum|instance:etcd_disk_backend_commit_duration_seconds:histogram_quantile|jaeger_operator_instances_storage_types|jaeger_operator_instances_strategies|jaeger_operator_instances_agent_strategies|type:tempo_operator_tempostack_storage_backend:sum|state:tempo_operator_tempostack_managed:sum|type:tempo_operator_tempostack_multi_tenancy:sum|enabled:tempo_operator_tempostack_jaeger_ui:sum|type:opentelemetry_collector_receivers:sum|type:opentelemetry_collector_exporters:sum|type:opentelemetry_collector_processors:sum|type:opentelemetry_collector_extensions:sum|type:opentelemetry_collector_connectors:sum|type:opentelemetry_collector_info:sum|appsvcs:cores_by_product:sum|nto_custom_profiles:count|openshift_csi_share_configmap|openshift_csi_share_secret|openshift_csi_share_mount_failures_total|openshift_csi_share_mount_requests_total|eo_es_storage_info|eo_es_redundancy_policy_info|eo_es_defined_delete_namespaces_total|eo_es_misconfigured_memory_resources_info|cluster:eo_es_data_nodes_total:max|cluster:eo_es_documents_created_total:sum|cluster:eo_es_documents_deleted_total:sum|pod:eo_es_shards_total:max|eo_es_cluster_management_state_info|imageregistry:imagestreamtags_count:sum|imageregistry:operations_count:sum|log_logging_info|log_collector_error_count_total|log_forwarder_pipeline_info|log_forwarder_input_info|log_forwarder_output_info|cluster:log_collected_bytes_total:sum|cluster:log_logged_bytes_total:sum|openshift_logging:log_forwarder_pipelines:sum|openshift_logging:log_forwarders:sum|openshift_logging:log_forwarder_input_type:sum|openshift_logging:log_forwarder_output_type:sum|openshift_logging:vector_component_received_bytes_total:rate5m|cluster:kata_monitor_running_shim_count:sum|platform:hypershift_hostedclusters:max|platform:hypershift_nodepools:max|cluster_name:hypershift_nodepools_size:sum|cluster_name:hypershift_nodepools_available_replicas:sum|namespace:noobaa_unhealthy_bucket_claims:max|namespace:noobaa_buckets_claims:max|namespace:noobaa_unhealthy_namespace_resources:max|namespace:noobaa_namespace_resources:max|namespace:noobaa_unhealthy_namespace_buckets:max|namespace:noobaa_namespace_buckets:max|namespace:noobaa_accounts:max|namespace:noobaa_usage:max|namespace:noobaa_system_health_status:max|ocs_advanced_feature_usage|os_image_url_override:sum|cluster:vsphere_topology_tags:max|cluster:vsphere_infrastructure_failure_domains:max|apiserver_list_watch_request_success_total:rate:sum|rhacs:telemetry:rox_central_info|rhacs:telemetry:rox_central_secured_clusters|rhacs:telemetry:rox_central_secured_nodes|rhacs:telemetry:rox_central_secured_vcpus|rhacs:telemetry:rox_sensor_info|cluster:volume_manager_selinux_pod_context_mismatch_total|cluster:volume_manager_selinux_volume_context_mismatch_warnings_total|cluster:volume_manager_selinux_volume_context_mismatch_errors_total|cluster:volume_manager_selinux_volumes_admitted_total|ols:provider_model_configuration|ols:rest_api_query_calls_total:2xx|ols:rest_api_query_calls_total:4xx|ols:rest_api_query_calls_total:5xx|openshift:openshift_network_operator_ipsec_state:info|cluster:health:group_severity:count|cluster:controlplane_topology:info|cluster:infrastructure_topology:info",action=~"Pass|Allow|Deny|Allow|Deny|",alertstate=~"firing|",direction=~"Ingress|Egress|Ingress|Egress|",enabled=~"true|false|",mode=~"HighlyAvailable|HighlyAvailableArbiter|SingleReplica|DualReplica|External|HighlyAvailable|SingleReplica|",page=~"overview-classic|overview-fleet|search|search-details|clusters|application|governance|",quantile=~"0.99|0.99|0.99|",reason=~"memory_working_set_delta_from_request|memory_rss_delta_from_request|",severity=~"critical|warning|info|none|critical|warning|info|none|",state=~"Managed|Unmanaged|",system_type=~"OCS|OCS|",system_vendor=~"Red Hat|Red Hat|",table_name=~"ACL|Address_Set|ACL|Address_Set|",type=~"azure|gcs|s3|static|openshift|disabled|jaeger|hostmetrics|opencensus|prometheus|zipkin|kafka|filelog|journald|k8sevents|kubeletstats|k8scluster|k8sobjects|otlp|debug|logging|otlp|otlphttp|prometheus|lokiexporter|kafka|awscloudwatchlogs|loadbalancing|batch|memorylimiter|attributes|resource|span|k8sattributes|resourcedetection|filter|routing|cumulativetodelta|groupbyattrs|zpages|ballast|memorylimiter|jaegerremotesampling|healthcheck|pprof|oauth2clientauth|oidcauth|bearertokenauth|filestorage|spanmetrics|forward|deployment|daemonset|sidecar|statefulset|",vendor=~"NVIDIA|AMD|GAUDI|INTEL|QUALCOMM|",verb=~"LIST|WATCH|"}
+{__name__=~"cluster:usage:.*|count:up0|count:up1|cluster_version|cluster_version_available_updates|cluster_version_capability|cluster_operator_up|cluster_operator_conditions|cluster_version_payload|cluster_installer|cluster_infrastructure_provider|cluster_feature_set|instance:etcd_object_counts:sum|ALERTS|code:apiserver_request_total:rate:sum|cluster:capacity_cpu_cores:sum|cluster:capacity_memory_bytes:sum|cluster:cpu_usage_cores:sum|cluster:memory_usage_bytes:sum|openshift:cpu_usage_cores:sum|openshift:memory_usage_bytes:sum|workload:cpu_usage_cores:sum|workload:memory_usage_bytes:sum|cluster:virt_platform_nodes:sum|cluster:node_instance_type_count:sum|cnv:vmi_status_running:count|cnv_abnormal|cluster:vmi_request_cpu_cores:sum|node_role_os_version_machine:cpu_capacity_cores:sum|node_role_os_version_machine:cpu_capacity_sockets:sum|subscription_sync_total|olm_resolution_duration_seconds|csv_succeeded|csv_abnormal|cluster:kube_persistentvolumeclaim_resource_requests_storage_bytes:provisioner:sum|cluster:kubelet_volume_stats_used_bytes:provisioner:sum|ceph_cluster_total_bytes|ceph_cluster_total_used_raw_bytes|ceph_health_status|odf_system_raw_capacity_total_bytes|odf_system_raw_capacity_used_bytes|odf_system_health_status|job:ceph_osd_metadata:count|job:kube_pv:count|job:odf_system_pvs:count|job:ceph_pools_iops:total|job:ceph_pools_iops_bytes:total|job:ceph_versions_running:count|job:noobaa_total_unhealthy_buckets:sum|job:noobaa_bucket_count:sum|job:noobaa_total_object_count:sum|odf_system_bucket_count|odf_system_objects_total|noobaa_accounts_num|noobaa_total_usage|console_url|cluster:console_auth_login_requests_total:sum|cluster:console_auth_login_successes_total:sum|cluster:console_auth_login_failures_total:sum|cluster:console_auth_logout_requests_total:sum|cluster:console_usage_users:max|cluster:console_plugins_info:max|cluster:console_customization_perspectives_info:max|cluster:ovnkube_controller_egress_routing_via_host:max|cluster:ovnkube_controller_admin_network_policies_db_objects:max|cluster:ovnkube_controller_baseline_admin_network_policies_db_objects:max|cluster:ovnkube_controller_admin_network_policies_rules:max|cluster:ovnkube_controller_baseline_admin_network_policies_rules:max|cluster:network_attachment_definition_instances:max|cluster:network_attachment_definition_enabled_instance_up:max|cluster:ovnkube_clustermanager_user_defined_networks:max|cluster:ovnkube_clustermanager_cluster_user_defined_networks:max|cluster:ingress_controller_aws_nlb_active:sum|cluster:route_metrics_controller_routes_per_shard:min|cluster:route_metrics_controller_routes_per_shard:max|cluster:route_metrics_controller_routes_per_shard:avg|cluster:route_metrics_controller_routes_per_shard:median|cluster:openshift_route_info:tls_termination:sum|insightsclient_request_send_total|cam_app_workload_migrations|cluster:apiserver_current_inflight_requests:sum:max_over_time:2m|cluster:alertmanager_integrations:max|cluster:telemetry_selected_series:count|openshift:prometheus_tsdb_head_series:sum|openshift:prometheus_tsdb_head_samples_appended_total:sum|monitoring:container_memory_working_set_bytes:sum|namespace_job:scrape_series_added:topk3_sum1h|namespace_job:scrape_samples_post_metric_relabeling:topk3|monitoring:haproxy_server_http_responses_total:sum|profile:cluster_monitoring_operator_collection_profile:max|vendor_model:node_accelerator_cards:sum|rhmi_status|status:upgrading:version:rhoam_state:max|state:rhoam_critical_alerts:max|state:rhoam_warning_alerts:max|rhoam_7d_slo_percentile:max|rhoam_7d_slo_remaining_error_budget:max|cluster_legacy_scheduler_policy|cluster_master_schedulable|che_workspace_status|che_workspace_started_total|che_workspace_failure_total|che_workspace_start_time_seconds_sum|che_workspace_start_time_seconds_count|cco_credentials_mode|cluster:kube_persistentvolume_plugin_type_counts:sum|acm_managed_cluster_info|acm_managed_cluster_worker_cores:max|acm_console_page_count:sum|cluster:vsphere_vcenter_info:sum|cluster:vsphere_esxi_version_total:sum|cluster:vsphere_node_hw_version_total:sum|openshift:build_by_strategy:sum|rhods_aggregate_availability|rhods_total_users|instance:etcd_disk_wal_fsync_duration_seconds:histogram_quantile|instance:etcd_mvcc_db_total_size_in_bytes:sum|instance:etcd_network_peer_round_trip_time_seconds:histogram_quantile|instance:etcd_mvcc_db_total_size_in_use_in_bytes:sum|instance:etcd_disk_backend_commit_duration_seconds:histogram_quantile|jaeger_operator_instances_storage_types|jaeger_operator_instances_strategies|jaeger_operator_instances_agent_strategies|type:tempo_operator_tempostack_storage_backend:sum|state:tempo_operator_tempostack_managed:sum|type:tempo_operator_tempostack_multi_tenancy:sum|enabled:tempo_operator_tempostack_jaeger_ui:sum|type:opentelemetry_collector_receivers:sum|type:opentelemetry_collector_exporters:sum|type:opentelemetry_collector_processors:sum|type:opentelemetry_collector_extensions:sum|type:opentelemetry_collector_connectors:sum|type:opentelemetry_collector_info:sum|appsvcs:cores_by_product:sum|nto_custom_profiles:count|openshift_csi_share_configmap|openshift_csi_share_secret|openshift_csi_share_mount_failures_total|openshift_csi_share_mount_requests_total|eo_es_storage_info|eo_es_redundancy_policy_info|eo_es_defined_delete_namespaces_total|eo_es_misconfigured_memory_resources_info|cluster:eo_es_data_nodes_total:max|cluster:eo_es_documents_created_total:sum|cluster:eo_es_documents_deleted_total:sum|pod:eo_es_shards_total:max|eo_es_cluster_management_state_info|imageregistry:imagestreamtags_count:sum|imageregistry:operations_count:sum|log_logging_info|log_collector_error_count_total|log_forwarder_pipeline_info|log_forwarder_input_info|log_forwarder_output_info|cluster:log_collected_bytes_total:sum|cluster:log_logged_bytes_total:sum|openshift_logging:log_forwarder_pipelines:sum|openshift_logging:log_forwarders:sum|openshift_logging:log_forwarder_input_type:sum|openshift_logging:log_forwarder_output_type:sum|openshift_logging:vector_component_received_bytes_total:rate5m|cluster:kata_monitor_running_shim_count:sum|platform:hypershift_hostedclusters:max|platform:hypershift_nodepools:max|cluster_name:hypershift_nodepools_size:sum|cluster_name:hypershift_nodepools_available_replicas:sum|namespace:noobaa_unhealthy_bucket_claims:max|namespace:noobaa_buckets_claims:max|namespace:noobaa_unhealthy_namespace_resources:max|namespace:noobaa_namespace_resources:max|namespace:noobaa_unhealthy_namespace_buckets:max|namespace:noobaa_namespace_buckets:max|namespace:noobaa_accounts:max|namespace:noobaa_usage:max|namespace:noobaa_system_health_status:max|ocs_advanced_feature_usage|os_image_url_override:sum|cluster:vsphere_topology_tags:max|cluster:vsphere_infrastructure_failure_domains:max|apiserver_list_watch_request_success_total:rate:sum|rhacs:telemetry:rox_central_info|rhacs:telemetry:rox_central_secured_clusters|rhacs:telemetry:rox_central_secured_nodes|rhacs:telemetry:rox_central_secured_vcpus|rhacs:telemetry:rox_sensor_info|cluster:volume_manager_selinux_pod_context_mismatch_total|cluster:volume_manager_selinux_volume_context_mismatch_warnings_total|cluster:volume_manager_selinux_volume_context_mismatch_errors_total|cluster:volume_manager_selinux_volumes_admitted_total|ols:provider_model_configuration|ols:rest_api_query_calls_total:2xx|ols:rest_api_query_calls_total:4xx|ols:rest_api_query_calls_total:5xx|openshift:openshift_network_operator_ipsec_state:info|cluster:health:group_severity:count|cluster:controlplane_topology:info|cluster:infrastructure_topology:info",action=~"Pass|Allow|Deny|Allow|Deny|",alertstate=~"firing|",direction=~"Ingress|Egress|Ingress|Egress|",enabled=~"true|false|",mode=~"HighlyAvailable|HighlyAvailableArbiter|SingleReplica|DualReplica|External|HighlyAvailable|SingleReplica|",page=~"overview-classic|overview-fleet|search|search-details|clusters|application|governance|",quantile=~"0.99|0.99|0.99|",reason=~"memory_working_set_delta_from_request|memory_rss_delta_from_request|",role=~"Primary|Secondary|Primary|Secondary|",severity=~"critical|warning|info|none|critical|warning|info|none|",state=~"Managed|Unmanaged|",system_type=~"OCS|OCS|",system_vendor=~"Red Hat|Red Hat|",table_name=~"ACL|Address_Set|ACL|Address_Set|",topology=~"Layer2|Layer3|Layer2|Layer3|",type=~"azure|gcs|s3|static|openshift|disabled|jaeger|hostmetrics|opencensus|prometheus|zipkin|kafka|filelog|journald|k8sevents|kubeletstats|k8scluster|k8sobjects|otlp|debug|logging|otlp|otlphttp|prometheus|lokiexporter|kafka|awscloudwatchlogs|loadbalancing|batch|memorylimiter|attributes|resource|span|k8sattributes|resourcedetection|filter|routing|cumulativetodelta|groupbyattrs|zpages|ballast|memorylimiter|jaegerremotesampling|healthcheck|pprof|oauth2clientauth|oidcauth|bearertokenauth|filestorage|spanmetrics|forward|deployment|daemonset|sidecar|statefulset|",vendor=~"NVIDIA|AMD|GAUDI|INTEL|QUALCOMM|",verb=~"LIST|WATCH|"}


that's autogenerated

openshift-bot · 2025-08-15T01:00:26Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

danwinship · 2025-08-18T15:49:06Z

/hold
for openshift/cluster-network-operator#2697

(but ready to merge as soon as that does)

openshift-ci · 2025-08-18T18:46:34Z

@danwinship: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`d03573e`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/generate	`d03573e`	link	true	`/test generate`
ci/prow/versions	`d03573e`	link	false	`/test versions`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2025

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 29, 2025

openshift-ci bot requested review from anuragthehatter, jan--f and marioferh April 29, 2025 15:25

danwinship force-pushed the udn-telemetry branch from bf3aeaf to d1c642e Compare April 29, 2025 17:33

juzhao reviewed May 15, 2025

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 15, 2025

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2025

danwinship force-pushed the udn-telemetry branch from d1c642e to 5873206 Compare August 18, 2025 13:54

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2025

danwinship changed the title ~~WIP: OCPBUGS-54806: Add telemetry for user-defined networks~~ OCPBUGS-54806: Add telemetry for user-defined networks Aug 18, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 18, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 18, 2025

Add telemetry for user-defined networks

d03573e

danwinship force-pushed the udn-telemetry branch from 5873206 to d03573e Compare August 18, 2025 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCPBUGS-54806: Add telemetry for user-defined networks #2596

OCPBUGS-54806: Add telemetry for user-defined networks #2596

Uh oh!

danwinship commented Apr 29, 2025

Uh oh!

openshift-ci-robot commented Apr 29, 2025

Uh oh!

openshift-ci bot commented Apr 29, 2025

Uh oh!

danwinship commented Apr 30, 2025

Uh oh!

jan--f commented May 5, 2025

Uh oh!

danwinship commented May 5, 2025

Uh oh!

juzhao May 15, 2025

Uh oh!

danwinship May 15, 2025

Uh oh!

openshift-bot commented Aug 15, 2025

Uh oh!

danwinship commented Aug 18, 2025

Uh oh!

openshift-ci bot commented Aug 18, 2025

Uh oh!

Uh oh!

		@@ -1 +1 @@
		{__name__=~"cluster:usage:.*\|count:up0\|count:up1\|cluster_version\|cluster_version_available_updates\|cluster_version_capability\|cluster_operator_up\|cluster_operator_conditions\|cluster_version_payload\|cluster_installer\|cluster_infrastructure_provider\|cluster_feature_set\|instance:etcd_object_counts:sum\|ALERTS\|code:apiserver_request_total:rate:sum\|cluster:capacity_cpu_cores:sum\|cluster:capacity_memory_bytes:sum\|cluster:cpu_usage_cores:sum\|cluster:memory_usage_bytes:sum\|openshift:cpu_usage_cores:sum\|openshift:memory_usage_bytes:sum\|workload:cpu_usage_cores:sum\|workload:memory_usage_bytes:sum\|cluster:virt_platform_nodes:sum\|cluster:node_instance_type_count:sum\|cnv:vmi_status_running:count\|cnv_abnormal\|cluster:vmi_request_cpu_cores:sum\|node_role_os_version_machine:cpu_capacity_cores:sum\|node_role_os_version_machine:cpu_capacity_sockets:sum\|subscription_sync_total\|olm_resolution_duration_seconds\|csv_succeeded\|csv_abnormal\|cluster:kube_persistentvolumeclaim_resource_requests_storage_bytes:provisioner:sum\|cluster:kubelet_volume_stats_used_bytes:provisioner:sum\|ceph_cluster_total_bytes\|ceph_cluster_total_used_raw_bytes\|ceph_health_status\|odf_system_raw_capacity_total_bytes\|odf_system_raw_capacity_used_bytes\|odf_system_health_status\|job:ceph_osd_metadata:count\|job:kube_pv:count\|job:odf_system_pvs:count\|job:ceph_pools_iops:total\|job:ceph_pools_iops_bytes:total\|job:ceph_versions_running:count\|job:noobaa_total_unhealthy_buckets:sum\|job:noobaa_bucket_count:sum\|job:noobaa_total_object_count:sum\|odf_system_bucket_count\|odf_system_objects_total\|noobaa_accounts_num\|noobaa_total_usage\|console_url\|cluster:console_auth_login_requests_total:sum\|cluster:console_auth_login_successes_total:sum\|cluster:console_auth_login_failures_total:sum\|cluster:console_auth_logout_requests_total:sum\|cluster:console_usage_users:max\|cluster:console_plugins_info:max\|cluster:console_customization_perspectives_info:max\|cluster:ovnkube_controller_egress_routing_via_host:max\|cluster:ovnkube_controller_admin_network_policies_db_objects:max\|cluster:ovnkube_controller_baseline_admin_network_policies_db_objects:max\|cluster:ovnkube_controller_admin_network_policies_rules:max\|cluster:ovnkube_controller_baseline_admin_network_policies_rules:max\|cluster:network_attachment_definition_instances:max\|cluster:network_attachment_definition_enabled_instance_up:max\|cluster:ingress_controller_aws_nlb_active:sum\|cluster:route_metrics_controller_routes_per_shard:min\|cluster:route_metrics_controller_routes_per_shard:max\|cluster:route_metrics_controller_routes_per_shard:avg\|cluster:route_metrics_controller_routes_per_shard:median\|cluster:openshift_route_info:tls_termination:sum\|insightsclient_request_send_total\|cam_app_workload_migrations\|cluster:apiserver_current_inflight_requests:sum:max_over_time:2m\|cluster:alertmanager_integrations:max\|cluster:telemetry_selected_series:count\|openshift:prometheus_tsdb_head_series:sum\|openshift:prometheus_tsdb_head_samples_appended_total:sum\|monitoring:container_memory_working_set_bytes:sum\|namespace_job:scrape_series_added:topk3_sum1h\|namespace_job:scrape_samples_post_metric_relabeling:topk3\|monitoring:haproxy_server_http_responses_total:sum\|profile:cluster_monitoring_operator_collection_profile:max\|vendor_model:node_accelerator_cards:sum\|rhmi_status\|status:upgrading:version:rhoam_state:max\|state:rhoam_critical_alerts:max\|state:rhoam_warning_alerts:max\|rhoam_7d_slo_percentile:max\|rhoam_7d_slo_remaining_error_budget:max\|cluster_legacy_scheduler_policy\|cluster_master_schedulable\|che_workspace_status\|che_workspace_started_total\|che_workspace_failure_total\|che_workspace_start_time_seconds_sum\|che_workspace_start_time_seconds_count\|cco_credentials_mode\|cluster:kube_persistentvolume_plugin_type_counts:sum\|acm_managed_cluster_info\|acm_managed_cluster_worker_cores:max\|acm_console_page_count:sum\|cluster:vsphere_vcenter_info:sum\|cluster:vsphere_esxi_version_total:sum\|cluster:vsphere_node_hw_version_total:sum\|openshift:build_by_strategy:sum\|rhods_aggregate_availability\|rhods_total_users\|instance:etcd_disk_wal_fsync_duration_seconds:histogram_quantile\|instance:etcd_mvcc_db_total_size_in_bytes:sum\|instance:etcd_network_peer_round_trip_time_seconds:histogram_quantile\|instance:etcd_mvcc_db_total_size_in_use_in_bytes:sum\|instance:etcd_disk_backend_commit_duration_seconds:histogram_quantile\|jaeger_operator_instances_storage_types\|jaeger_operator_instances_strategies\|jaeger_operator_instances_agent_strategies\|type:tempo_operator_tempostack_storage_backend:sum\|state:tempo_operator_tempostack_managed:sum\|type:tempo_operator_tempostack_multi_tenancy:sum\|enabled:tempo_operator_tempostack_jaeger_ui:sum\|type:opentelemetry_collector_receivers:sum\|type:opentelemetry_collector_exporters:sum\|type:opentelemetry_collector_processors:sum\|type:opentelemetry_collector_extensions:sum\|type:opentelemetry_collector_connectors:sum\|type:opentelemetry_collector_info:sum\|appsvcs:cores_by_product:sum\|nto_custom_profiles:count\|openshift_csi_share_configmap\|openshift_csi_share_secret\|openshift_csi_share_mount_failures_total\|openshift_csi_share_mount_requests_total\|eo_es_storage_info\|eo_es_redundancy_policy_info\|eo_es_defined_delete_namespaces_total\|eo_es_misconfigured_memory_resources_info\|cluster:eo_es_data_nodes_total:max\|cluster:eo_es_documents_created_total:sum\|cluster:eo_es_documents_deleted_total:sum\|pod:eo_es_shards_total:max\|eo_es_cluster_management_state_info\|imageregistry:imagestreamtags_count:sum\|imageregistry:operations_count:sum\|log_logging_info\|log_collector_error_count_total\|log_forwarder_pipeline_info\|log_forwarder_input_info\|log_forwarder_output_info\|cluster:log_collected_bytes_total:sum\|cluster:log_logged_bytes_total:sum\|openshift_logging:log_forwarder_pipelines:sum\|openshift_logging:log_forwarders:sum\|openshift_logging:log_forwarder_input_type:sum\|openshift_logging:log_forwarder_output_type:sum\|openshift_logging:vector_component_received_bytes_total:rate5m\|cluster:kata_monitor_running_shim_count:sum\|platform:hypershift_hostedclusters:max\|platform:hypershift_nodepools:max\|cluster_name:hypershift_nodepools_size:sum\|cluster_name:hypershift_nodepools_available_replicas:sum\|namespace:noobaa_unhealthy_bucket_claims:max\|namespace:noobaa_buckets_claims:max\|namespace:noobaa_unhealthy_namespace_resources:max\|namespace:noobaa_namespace_resources:max\|namespace:noobaa_unhealthy_namespace_buckets:max\|namespace:noobaa_namespace_buckets:max\|namespace:noobaa_accounts:max\|namespace:noobaa_usage:max\|namespace:noobaa_system_health_status:max\|ocs_advanced_feature_usage\|os_image_url_override:sum\|cluster:vsphere_topology_tags:max\|cluster:vsphere_infrastructure_failure_domains:max\|apiserver_list_watch_request_success_total:rate:sum\|rhacs:telemetry:rox_central_info\|rhacs:telemetry:rox_central_secured_clusters\|rhacs:telemetry:rox_central_secured_nodes\|rhacs:telemetry:rox_central_secured_vcpus\|rhacs:telemetry:rox_sensor_info\|cluster:volume_manager_selinux_pod_context_mismatch_total\|cluster:volume_manager_selinux_volume_context_mismatch_warnings_total\|cluster:volume_manager_selinux_volume_context_mismatch_errors_total\|cluster:volume_manager_selinux_volumes_admitted_total\|ols:provider_model_configuration\|ols:rest_api_query_calls_total:2xx\|ols:rest_api_query_calls_total:4xx\|ols:rest_api_query_calls_total:5xx\|openshift:openshift_network_operator_ipsec_state:info\|cluster:health:group_severity:count\|cluster:controlplane_topology:info\|cluster:infrastructure_topology:info",action=~"Pass\|Allow\|Deny\|Allow\|Deny\|",alertstate=~"firing\|",direction=~"Ingress\|Egress\|Ingress\|Egress\|",enabled=~"true\|false\|",mode=~"HighlyAvailable\|HighlyAvailableArbiter\|SingleReplica\|DualReplica\|External\|HighlyAvailable\|SingleReplica\|",page=~"overview-classic\|overview-fleet\|search\|search-details\|clusters\|application\|governance\|",quantile=~"0.99\|0.99\|0.99\|",reason=~"memory_working_set_delta_from_request\|memory_rss_delta_from_request\|",severity=~"critical\|warning\|info\|none\|critical\|warning\|info\|none\|",state=~"Managed\|Unmanaged\|",system_type=~"OCS\|OCS\|",system_vendor=~"Red Hat\|Red Hat\|",table_name=~"ACL\|Address_Set\|ACL\|Address_Set\|",type=~"azure\|gcs\|s3\|static\|openshift\|disabled\|jaeger\|hostmetrics\|opencensus\|prometheus\|zipkin\|kafka\|filelog\|journald\|k8sevents\|kubeletstats\|k8scluster\|k8sobjects\|otlp\|debug\|logging\|otlp\|otlphttp\|prometheus\|lokiexporter\|kafka\|awscloudwatchlogs\|loadbalancing\|batch\|memorylimiter\|attributes\|resource\|span\|k8sattributes\|resourcedetection\|filter\|routing\|cumulativetodelta\|groupbyattrs\|zpages\|ballast\|memorylimiter\|jaegerremotesampling\|healthcheck\|pprof\|oauth2clientauth\|oidcauth\|bearertokenauth\|filestorage\|spanmetrics\|forward\|deployment\|daemonset\|sidecar\|statefulset\|",vendor=~"NVIDIA\|AMD\|GAUDI\|INTEL\|QUALCOMM\|",verb=~"LIST\|WATCH\|"}
		{__name__=~"cluster:usage:.*\|count:up0\|count:up1\|cluster_version\|cluster_version_available_updates\|cluster_version_capability\|cluster_operator_up\|cluster_operator_conditions\|cluster_version_payload\|cluster_installer\|cluster_infrastructure_provider\|cluster_feature_set\|instance:etcd_object_counts:sum\|ALERTS\|code:apiserver_request_total:rate:sum\|cluster:capacity_cpu_cores:sum\|cluster:capacity_memory_bytes:sum\|cluster:cpu_usage_cores:sum\|cluster:memory_usage_bytes:sum\|openshift:cpu_usage_cores:sum\|openshift:memory_usage_bytes:sum\|workload:cpu_usage_cores:sum\|workload:memory_usage_bytes:sum\|cluster:virt_platform_nodes:sum\|cluster:node_instance_type_count:sum\|cnv:vmi_status_running:count\|cnv_abnormal\|cluster:vmi_request_cpu_cores:sum\|node_role_os_version_machine:cpu_capacity_cores:sum\|node_role_os_version_machine:cpu_capacity_sockets:sum\|subscription_sync_total\|olm_resolution_duration_seconds\|csv_succeeded\|csv_abnormal\|cluster:kube_persistentvolumeclaim_resource_requests_storage_bytes:provisioner:sum\|cluster:kubelet_volume_stats_used_bytes:provisioner:sum\|ceph_cluster_total_bytes\|ceph_cluster_total_used_raw_bytes\|ceph_health_status\|odf_system_raw_capacity_total_bytes\|odf_system_raw_capacity_used_bytes\|odf_system_health_status\|job:ceph_osd_metadata:count\|job:kube_pv:count\|job:odf_system_pvs:count\|job:ceph_pools_iops:total\|job:ceph_pools_iops_bytes:total\|job:ceph_versions_running:count\|job:noobaa_total_unhealthy_buckets:sum\|job:noobaa_bucket_count:sum\|job:noobaa_total_object_count:sum\|odf_system_bucket_count\|odf_system_objects_total\|noobaa_accounts_num\|noobaa_total_usage\|console_url\|cluster:console_auth_login_requests_total:sum\|cluster:console_auth_login_successes_total:sum\|cluster:console_auth_login_failures_total:sum\|cluster:console_auth_logout_requests_total:sum\|cluster:console_usage_users:max\|cluster:console_plugins_info:max\|cluster:console_customization_perspectives_info:max\|cluster:ovnkube_controller_egress_routing_via_host:max\|cluster:ovnkube_controller_admin_network_policies_db_objects:max\|cluster:ovnkube_controller_baseline_admin_network_policies_db_objects:max\|cluster:ovnkube_controller_admin_network_policies_rules:max\|cluster:ovnkube_controller_baseline_admin_network_policies_rules:max\|cluster:network_attachment_definition_instances:max\|cluster:network_attachment_definition_enabled_instance_up:max\|cluster:ovnkube_clustermanager_user_defined_networks:max\|cluster:ovnkube_clustermanager_cluster_user_defined_networks:max\|cluster:ingress_controller_aws_nlb_active:sum\|cluster:route_metrics_controller_routes_per_shard:min\|cluster:route_metrics_controller_routes_per_shard:max\|cluster:route_metrics_controller_routes_per_shard:avg\|cluster:route_metrics_controller_routes_per_shard:median\|cluster:openshift_route_info:tls_termination:sum\|insightsclient_request_send_total\|cam_app_workload_migrations\|cluster:apiserver_current_inflight_requests:sum:max_over_time:2m\|cluster:alertmanager_integrations:max\|cluster:telemetry_selected_series:count\|openshift:prometheus_tsdb_head_series:sum\|openshift:prometheus_tsdb_head_samples_appended_total:sum\|monitoring:container_memory_working_set_bytes:sum\|namespace_job:scrape_series_added:topk3_sum1h\|namespace_job:scrape_samples_post_metric_relabeling:topk3\|monitoring:haproxy_server_http_responses_total:sum\|profile:cluster_monitoring_operator_collection_profile:max\|vendor_model:node_accelerator_cards:sum\|rhmi_status\|status:upgrading:version:rhoam_state:max\|state:rhoam_critical_alerts:max\|state:rhoam_warning_alerts:max\|rhoam_7d_slo_percentile:max\|rhoam_7d_slo_remaining_error_budget:max\|cluster_legacy_scheduler_policy\|cluster_master_schedulable\|che_workspace_status\|che_workspace_started_total\|che_workspace_failure_total\|che_workspace_start_time_seconds_sum\|che_workspace_start_time_seconds_count\|cco_credentials_mode\|cluster:kube_persistentvolume_plugin_type_counts:sum\|acm_managed_cluster_info\|acm_managed_cluster_worker_cores:max\|acm_console_page_count:sum\|cluster:vsphere_vcenter_info:sum\|cluster:vsphere_esxi_version_total:sum\|cluster:vsphere_node_hw_version_total:sum\|openshift:build_by_strategy:sum\|rhods_aggregate_availability\|rhods_total_users\|instance:etcd_disk_wal_fsync_duration_seconds:histogram_quantile\|instance:etcd_mvcc_db_total_size_in_bytes:sum\|instance:etcd_network_peer_round_trip_time_seconds:histogram_quantile\|instance:etcd_mvcc_db_total_size_in_use_in_bytes:sum\|instance:etcd_disk_backend_commit_duration_seconds:histogram_quantile\|jaeger_operator_instances_storage_types\|jaeger_operator_instances_strategies\|jaeger_operator_instances_agent_strategies\|type:tempo_operator_tempostack_storage_backend:sum\|state:tempo_operator_tempostack_managed:sum\|type:tempo_operator_tempostack_multi_tenancy:sum\|enabled:tempo_operator_tempostack_jaeger_ui:sum\|type:opentelemetry_collector_receivers:sum\|type:opentelemetry_collector_exporters:sum\|type:opentelemetry_collector_processors:sum\|type:opentelemetry_collector_extensions:sum\|type:opentelemetry_collector_connectors:sum\|type:opentelemetry_collector_info:sum\|appsvcs:cores_by_product:sum\|nto_custom_profiles:count\|openshift_csi_share_configmap\|openshift_csi_share_secret\|openshift_csi_share_mount_failures_total\|openshift_csi_share_mount_requests_total\|eo_es_storage_info\|eo_es_redundancy_policy_info\|eo_es_defined_delete_namespaces_total\|eo_es_misconfigured_memory_resources_info\|cluster:eo_es_data_nodes_total:max\|cluster:eo_es_documents_created_total:sum\|cluster:eo_es_documents_deleted_total:sum\|pod:eo_es_shards_total:max\|eo_es_cluster_management_state_info\|imageregistry:imagestreamtags_count:sum\|imageregistry:operations_count:sum\|log_logging_info\|log_collector_error_count_total\|log_forwarder_pipeline_info\|log_forwarder_input_info\|log_forwarder_output_info\|cluster:log_collected_bytes_total:sum\|cluster:log_logged_bytes_total:sum\|openshift_logging:log_forwarder_pipelines:sum\|openshift_logging:log_forwarders:sum\|openshift_logging:log_forwarder_input_type:sum\|openshift_logging:log_forwarder_output_type:sum\|openshift_logging:vector_component_received_bytes_total:rate5m\|cluster:kata_monitor_running_shim_count:sum\|platform:hypershift_hostedclusters:max\|platform:hypershift_nodepools:max\|cluster_name:hypershift_nodepools_size:sum\|cluster_name:hypershift_nodepools_available_replicas:sum\|namespace:noobaa_unhealthy_bucket_claims:max\|namespace:noobaa_buckets_claims:max\|namespace:noobaa_unhealthy_namespace_resources:max\|namespace:noobaa_namespace_resources:max\|namespace:noobaa_unhealthy_namespace_buckets:max\|namespace:noobaa_namespace_buckets:max\|namespace:noobaa_accounts:max\|namespace:noobaa_usage:max\|namespace:noobaa_system_health_status:max\|ocs_advanced_feature_usage\|os_image_url_override:sum\|cluster:vsphere_topology_tags:max\|cluster:vsphere_infrastructure_failure_domains:max\|apiserver_list_watch_request_success_total:rate:sum\|rhacs:telemetry:rox_central_info\|rhacs:telemetry:rox_central_secured_clusters\|rhacs:telemetry:rox_central_secured_nodes\|rhacs:telemetry:rox_central_secured_vcpus\|rhacs:telemetry:rox_sensor_info\|cluster:volume_manager_selinux_pod_context_mismatch_total\|cluster:volume_manager_selinux_volume_context_mismatch_warnings_total\|cluster:volume_manager_selinux_volume_context_mismatch_errors_total\|cluster:volume_manager_selinux_volumes_admitted_total\|ols:provider_model_configuration\|ols:rest_api_query_calls_total:2xx\|ols:rest_api_query_calls_total:4xx\|ols:rest_api_query_calls_total:5xx\|openshift:openshift_network_operator_ipsec_state:info\|cluster:health:group_severity:count\|cluster:controlplane_topology:info\|cluster:infrastructure_topology:info",action=~"Pass\|Allow\|Deny\|Allow\|Deny\|",alertstate=~"firing\|",direction=~"Ingress\|Egress\|Ingress\|Egress\|",enabled=~"true\|false\|",mode=~"HighlyAvailable\|HighlyAvailableArbiter\|SingleReplica\|DualReplica\|External\|HighlyAvailable\|SingleReplica\|",page=~"overview-classic\|overview-fleet\|search\|search-details\|clusters\|application\|governance\|",quantile=~"0.99\|0.99\|0.99\|",reason=~"memory_working_set_delta_from_request\|memory_rss_delta_from_request\|",role=~"Primary\|Secondary\|Primary\|Secondary\|",severity=~"critical\|warning\|info\|none\|critical\|warning\|info\|none\|",state=~"Managed\|Unmanaged\|",system_type=~"OCS\|OCS\|",system_vendor=~"Red Hat\|Red Hat\|",table_name=~"ACL\|Address_Set\|ACL\|Address_Set\|",topology=~"Layer2\|Layer3\|Layer2\|Layer3\|",type=~"azure\|gcs\|s3\|static\|openshift\|disabled\|jaeger\|hostmetrics\|opencensus\|prometheus\|zipkin\|kafka\|filelog\|journald\|k8sevents\|kubeletstats\|k8scluster\|k8sobjects\|otlp\|debug\|logging\|otlp\|otlphttp\|prometheus\|lokiexporter\|kafka\|awscloudwatchlogs\|loadbalancing\|batch\|memorylimiter\|attributes\|resource\|span\|k8sattributes\|resourcedetection\|filter\|routing\|cumulativetodelta\|groupbyattrs\|zpages\|ballast\|memorylimiter\|jaegerremotesampling\|healthcheck\|pprof\|oauth2clientauth\|oidcauth\|bearertokenauth\|filestorage\|spanmetrics\|forward\|deployment\|daemonset\|sidecar\|statefulset\|",vendor=~"NVIDIA\|AMD\|GAUDI\|INTEL\|QUALCOMM\|",verb=~"LIST\|WATCH\|"}

OCPBUGS-54806: Add telemetry for user-defined networks #2596

Are you sure you want to change the base?

OCPBUGS-54806: Add telemetry for user-defined networks #2596

Uh oh!

Conversation

danwinship commented Apr 29, 2025

Uh oh!

openshift-ci-robot commented Apr 29, 2025

Uh oh!

openshift-ci bot commented Apr 29, 2025

Uh oh!

danwinship commented Apr 30, 2025

Uh oh!

jan--f commented May 5, 2025

Uh oh!

danwinship commented May 5, 2025

Uh oh!

juzhao May 15, 2025

Choose a reason for hiding this comment

Uh oh!

danwinship May 15, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-bot commented Aug 15, 2025

Uh oh!

danwinship commented Aug 18, 2025

Uh oh!

openshift-ci bot commented Aug 18, 2025

Uh oh!

Uh oh!