[kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag#1528
Merged
puckpuck merged 10 commits intoopen-telemetry:mainfrom Apr 30, 2024
Merged
Conversation
7e33420 to
c379147
Compare
Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg)
also adjusted the resource limit for the frauddetection service since it kept dying
38ec15e to
cf5c1dc
Compare
beeme1mr
reviewed
Apr 16, 2024
src/frauddetectionservice/src/main/kotlin/frauddetectionservice/main.kt
Outdated
Show resolved
Hide resolved
puckpuck
reviewed
Apr 22, 2024
puckpuck
approved these changes
Apr 23, 2024
austinlparker
approved these changes
Apr 23, 2024
beeme1mr
reviewed
Apr 23, 2024
maxhakansson
added a commit
to maxhakansson/opentelemetry-demo
that referenced
this pull request
May 10, 2024
* main: (138 commits) docs: update sig meeting schedule (open-telemetry#1567) chore(deps): upgrade otel collector contrib and opensearch (open-telemetry#1566) fix(loadgenerator): use add_hooks openfeature method (open-telemetry#1565) Revert "remove axoflow link (open-telemetry#1457)" (open-telemetry#1563) feat: configure feature flag tracing for Python services (open-telemetry#1553) chore(deps): upgrade go dependencies to latest versions (open-telemetry#1561) remove deprecated version property (open-telemetry#1557) chore(deps): upgrade otel collector contrib, grafana and prometheus (open-telemetry#1559) add imageprovider (open-telemetry#1552) [flagd] - upgrade to latest version and memory limits (open-telemetry#1554) update kubernetes manifest to 1.9.0 (open-telemetry#1555) [chore] specify default value for tracetest image version (open-telemetry#1551) improve baggage propagation (open-telemetry#1545) Bump gradle/wrapper-validation-action from 3.3.1 to 3.3.2 (open-telemetry#1548) [kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag (open-telemetry#1528) fix(productcatalogservice): handle err returned from openfeature.SetProvider func (open-telemetry#1535) feat(otelcol): add redisreceiver (open-telemetry#1537) chore(deps): upgrade opentelemetry-java-instrumentation for kafka to 2.3.0 (open-telemetry#1533) Bump gradle/wrapper-validation-action from 3.3.0 to 3.3.1 (open-telemetry#1539) chore(deps): upgrade opentelemetry-java-instrumentation to 2.3.0 (open-telemetry#1532) ... # Conflicts: # docker-compose.minimal.yml # src/frontend/package-lock.json
neamulkabiremon
pushed a commit
to neamulkabiremon/ultimate-devops-project-demo
that referenced
this pull request
Apr 16, 2025
…s featureflag (open-telemetry#1528) * Add kafkaQueueProblems featureflag Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg) * changed feature flag to int value for more configurability also adjusted the resource limit for the frauddetection service since it kept dying * addressed PR comments * addressed PR comment --------- Co-authored-by: Austin Parker <austin@ap2.io>
mohamed3637
added a commit
to mohamed3637/opentelemetry-demo
that referenced
this pull request
Oct 7, 2025
…s featureflag (open-telemetry#1528) * Add kafkaQueueProblems featureflag Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg) * changed feature flag to int value for more configurability also adjusted the resource limit for the frauddetection service since it kept dying * addressed PR comments * addressed PR comment --------- Co-authored-by: Austin Parker <austin@ap2.io>
cloud-hb
pushed a commit
to cloud-hb/opentelemetry-demo
that referenced
this pull request
Nov 17, 2025
…s featureflag (open-telemetry#1528) * Add kafkaQueueProblems featureflag Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg) * changed feature flag to int value for more configurability also adjusted the resource limit for the frauddetection service since it kept dying * addressed PR comments * addressed PR comment --------- Co-authored-by: Austin Parker <austin@ap2.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
This PR adds a new feature flag
kafkaQueueProblemsto the opentelemetry demo. Upon activating the feature flag, the producer (checkoutservice) overloads Kafka by sending 100 extra messages to the queue per actual order. Simultaneously, the consumer (frauddetectionservice) delays the claiming of the messages by 1 second per message. This leads to a sudden spike in consumer lag. This is an interesting, real world observability scenario because it simulates queue problems in kafka which afaik no feature flag does yet. Metrics that monitor consumer lag can be viewed in Grafana (e.g.kafka_consumer_lag_avg).Also increased the resource limitations of the frauddetection service since it kept dying due to resource exhaustion. This also happened without my modifications to the service.
Looking forward to your feedback!
Merge Requirements
For new features contributions please make sure you have completed the following
essential items:
CHANGELOG.mdupdated to document new feature additionsMaintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.