Skip to content

[kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag#1528

Merged
puckpuck merged 10 commits intoopen-telemetry:mainfrom
EislM0203:kafka-queue-problems-featureflag
Apr 30, 2024
Merged

[kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag#1528
puckpuck merged 10 commits intoopen-telemetry:mainfrom
EislM0203:kafka-queue-problems-featureflag

Conversation

@EislM0203
Copy link
Copy Markdown
Contributor

@EislM0203 EislM0203 commented Apr 15, 2024

Changes

This PR adds a new feature flag kafkaQueueProblems to the opentelemetry demo. Upon activating the feature flag, the producer (checkoutservice) overloads Kafka by sending 100 extra messages to the queue per actual order. Simultaneously, the consumer (frauddetectionservice) delays the claiming of the messages by 1 second per message. This leads to a sudden spike in consumer lag. This is an interesting, real world observability scenario because it simulates queue problems in kafka which afaik no feature flag does yet. Metrics that monitor consumer lag can be viewed in Grafana (e.g. kafka_consumer_lag_avg).

Also increased the resource limitations of the frauddetection service since it kept dying due to resource exhaustion. This also happened without my modifications to the service.

Looking forward to your feedback!

Merge Requirements

For new features contributions please make sure you have completed the following
essential items:

  • CHANGELOG.md updated to document new feature additions
  • Appropriate documentation updates in the docs
  • Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

@github-actions github-actions bot added docs-update-required Requires documentation update helm-update-required Requires an update to the Helm chart when released labels Apr 15, 2024
@EislM0203 EislM0203 force-pushed the kafka-queue-problems-featureflag branch 2 times, most recently from 7e33420 to c379147 Compare April 15, 2024 10:26
Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike

The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg)
also adjusted the resource limit for the frauddetection service since it kept dying
@EislM0203 EislM0203 force-pushed the kafka-queue-problems-featureflag branch from 38ec15e to cf5c1dc Compare April 15, 2024 13:18
@EislM0203 EislM0203 changed the title Add kafkaQueueProblems featureflag [kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag Apr 15, 2024
@EislM0203 EislM0203 marked this pull request as ready for review April 15, 2024 15:58
@EislM0203 EislM0203 requested a review from a team April 15, 2024 15:58
@puckpuck puckpuck merged commit e0500b2 into open-telemetry:main Apr 30, 2024
@EislM0203 EislM0203 deleted the kafka-queue-problems-featureflag branch April 30, 2024 06:08
maxhakansson added a commit to maxhakansson/opentelemetry-demo that referenced this pull request May 10, 2024
* main: (138 commits)
  docs: update sig meeting schedule (open-telemetry#1567)
  chore(deps): upgrade otel collector contrib and opensearch (open-telemetry#1566)
  fix(loadgenerator): use add_hooks openfeature method (open-telemetry#1565)
  Revert "remove axoflow link (open-telemetry#1457)" (open-telemetry#1563)
  feat: configure feature flag tracing for Python services (open-telemetry#1553)
  chore(deps): upgrade go dependencies to latest versions (open-telemetry#1561)
  remove deprecated version property (open-telemetry#1557)
  chore(deps): upgrade otel collector contrib, grafana and prometheus (open-telemetry#1559)
  add imageprovider (open-telemetry#1552)
  [flagd] - upgrade to latest version and memory limits (open-telemetry#1554)
  update kubernetes manifest to 1.9.0 (open-telemetry#1555)
  [chore] specify default value for tracetest image version (open-telemetry#1551)
  improve baggage propagation (open-telemetry#1545)
  Bump gradle/wrapper-validation-action from 3.3.1 to 3.3.2 (open-telemetry#1548)
  [kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag (open-telemetry#1528)
  fix(productcatalogservice): handle err returned from openfeature.SetProvider func (open-telemetry#1535)
  feat(otelcol): add redisreceiver (open-telemetry#1537)
  chore(deps): upgrade opentelemetry-java-instrumentation for kafka to 2.3.0 (open-telemetry#1533)
  Bump gradle/wrapper-validation-action from 3.3.0 to 3.3.1 (open-telemetry#1539)
  chore(deps): upgrade opentelemetry-java-instrumentation to 2.3.0 (open-telemetry#1532)
  ...

# Conflicts:
#	docker-compose.minimal.yml
#	src/frontend/package-lock.json
neamulkabiremon pushed a commit to neamulkabiremon/ultimate-devops-project-demo that referenced this pull request Apr 16, 2025
…s featureflag (open-telemetry#1528)

* Add kafkaQueueProblems featureflag

Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike

The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg)

* changed feature flag to int value for more configurability

also adjusted the resource limit for the frauddetection service since it kept dying

* addressed PR comments

* addressed PR comment

---------

Co-authored-by: Austin Parker <austin@ap2.io>
mohamed3637 added a commit to mohamed3637/opentelemetry-demo that referenced this pull request Oct 7, 2025
…s featureflag (open-telemetry#1528)

* Add kafkaQueueProblems featureflag

Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike

The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg)

* changed feature flag to int value for more configurability

also adjusted the resource limit for the frauddetection service since it kept dying

* addressed PR comments

* addressed PR comment

---------

Co-authored-by: Austin Parker <austin@ap2.io>
cloud-hb pushed a commit to cloud-hb/opentelemetry-demo that referenced this pull request Nov 17, 2025
…s featureflag (open-telemetry#1528)

* Add kafkaQueueProblems featureflag

Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike

The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg)

* changed feature flag to int value for more configurability

also adjusted the resource limit for the frauddetection service since it kept dying

* addressed PR comments

* addressed PR comment

---------

Co-authored-by: Austin Parker <austin@ap2.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-update-required Requires documentation update helm-update-required Requires an update to the Helm chart when released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants