-
Notifications
You must be signed in to change notification settings - Fork 216
OCPBUGS-56157: Use large OTEL config example as default + descriptions tweaks #4903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-56157: Use large OTEL config example as default + descriptions tweaks #4903
Conversation
@pmtk: This pull request references Jira Issue OCPBUGS-56157, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold Thinking about changing variables names to avoid confusion |
560b581
to
0626b99
Compare
/retest |
2 similar comments
/retest |
/retest |
/test e2e-aws-tests-bootc-arm |
@pmtk: This pull request references Jira Issue OCPBUGS-56157, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
standard1 test taking too long due to recent router test suite size increase |
endpoint: ${env:K8S_NODE_NAME}:4317 | ||
# Endpoint must point an IP or hostname, and port of an OTLP backend service. | ||
# Here, the OTEL_BACKEND env var is used. It should be changed to point to the backend. | ||
# If left unchanged and undefined, it will be resolved to localhost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, where the default value of ${OTEL_BACKEND}
is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowhere actually. But it doesn't cause an error, just evaluates to empty string which is later replaced with localhost:
2025-05-16T09:08:10.316+0200 warn envprovider/provider.go:51 Configuration references unset environment variable {"name": "OTEL_BACKEND"}
...
2025-05-16T09:08:10.359+0200 warn zapgrpc/zapgrpc.go:193 [core] [Channel #1 SubChannel #3]grpc: addrConn.createTransport failed to connect to {Addr: "[::1]:4317", ServerName: "localhost:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:4317: connect: connection refused"
So user can just replace it directly in config or add an Environment=
to microshift-observability.service, but I prefer the former.
Alternative is to use word like backend
and that would be expected to be actual resolvable hostname, but again this doesn't cause otel-collector to exit with error:
exporters:
otlp:
endpoint: "backend:4317"
2025-05-16T10:58:52.371+0200 info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "otlp", "error": "rpc error: code = Unavailable desc = name resolver error: produced zero addresses", "interval": "2.644063999s"}
I'm open to other alternatives, anything beside reusing K8S_NODE_NAME as it's confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So OTEL internally set localhost
if the var is not defined. Fair enough.
Thinking about user experience, I'm not sure what's the best approach. I suggest adding that OTEL defaults it to localhost
, but I'm ok either way.
# If left unchanged and undefined, it will be resolved to localhost. | |
# If left unchanged and undefined, OTEL will resolve it to localhost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the comment line from
# If left unchanged and undefined, it will be resolved to localhost.
to
# Unless replaced in config or defined in service file, it'll be empty and OTEL will use 'localhost' instead.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer resolve
rather than use
when talking about setting localhost
here: OTEL will use 'localhost' instead. But I'm not really sure what's the correct verb, because if not set, it can not be resolved it, so...
Apart from that, I agree with your proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I think of resolve I think of DNS, there's no DNS here. Maybe someone else will chime in with ideas
this PR looks good to me |
84cc8e5
to
1dd20c5
Compare
/unhold |
Looks like the observability rewrite of the otel config is tripping the |
I also saw these error/issue in QE regression tests for 4.19 RC. Wondering if we should exclude configuration files from the |
Looks like it looses eof newline, I'll just copy back from the large preset.
|
1dd20c5
to
592be3b
Compare
592be3b
to
ec65b98
Compare
ec65b98
to
1ae4110
Compare
Yeah, it's probably a good idea. @pmtk @pacevedom what do y'all think? |
/test e2e-aws-tests-bootc |
I fixed the test to copy back the correct config example, so it should be good now - no need. |
@pmtk: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/cherrypick release-4.19 |
@pmtk: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggiguash, pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@pmtk: Jira Issue OCPBUGS-56157: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-56157 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@pmtk: new pull request created: #4954 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Removed
packaging/observability/opentelemetry-collector.yaml
as it should be equivalent to the large example (see openshift/openshift-docs#92758 (comment)).packaging/observability/opentelemetry-collector-large.yaml
is copied as default config to/etc/microshift/observability/opentelemetry-collector.yaml
. Not symlinked to avoid problematic scenario like:opentelemetry-collector.yaml
which is symlink toopentelemetry-collector-large.yaml
.opentelemetry-collector-large.yaml
is overwritten and so is user's configuration.Presets/examples description updates:
Container, Pod, Volume, and Node metrics
to avoid interpretation that the Pod's/metrics
are also included(Warnings only)
regarding events - could not find in the receivers source code any configuration or logic related to filtering events typespriority: info
(which is the default value) to journald receiver to match the descriptionChanged the OTEL backend vars to OTEL_BACKEND to avoid confusion when modifying the configuration. Kubelet-stats receiver's endpoint should not be changed, but if it's named the same as the backend vars, it's easy to bulk replace or misinterpret reconfiguration.