Skip to content

Deflake offset for leader epoch#28389

Merged
joe-redpanda merged 1 commit into
redpanda-data:devfrom
joe-redpanda:deflake_offset_for_leader_epoch
Nov 21, 2025
Merged

Deflake offset for leader epoch#28389
joe-redpanda merged 1 commit into
redpanda-data:devfrom
joe-redpanda:deflake_offset_for_leader_epoch

Conversation

@joe-redpanda

@joe-redpanda joe-redpanda commented Nov 5, 2025

Copy link
Copy Markdown
Contributor

Describe paritition can occasionally return -1 aka UNKNOWN_EPOCH for a
partition's term.

This epoch is subsequently fed into offset_for_leader_epoch.

This is an illegal value for offset_for_leader_epoch.

This commit changes the logic for gathering partition data to require
first that all parition descriptions returned have a valid high
watermark and a valid epoch.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x
  • v24.3.x

Release Notes

Bug Fixes

  • deflake offset for leader epoch

Copilot AI review requested due to automatic review settings November 5, 2025 22:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a flaky test issue in the offset_for_leader_epoch handler by improving leadership validation and introducing concurrent processing. The main change replaces the cached is_leader() check with a linearizable_barrier() to ensure accurate, real-time leadership status at the moment the request is received, preventing stale data from causing test failures.

Key changes:

  • Replaced cached leadership check with linearizable_barrier() for accurate real-time validation
  • Introduced concurrent processing using ss::max_concurrent_for_each with parallelism limit of 32
  • Restructured the request processing loop into a lambda-based concurrent execution model

Comment thread src/v/kafka/server/handlers/offset_for_leader_epoch.cc Outdated
Comment thread src/v/kafka/server/handlers/offset_for_leader_epoch.cc Outdated
@vbotbuildovich

vbotbuildovich commented Nov 6, 2025

Copy link
Copy Markdown
Collaborator

Retry command for Build#75708

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/offset_for_leader_epoch_test.py::OffsetForLeaderEpochTest.test_offset_for_leader_epoch

@vbotbuildovich

vbotbuildovich commented Nov 6, 2025

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#75708
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ReplicatedMetastoreTest TestBasicRemoveTopics unit https://buildkite.com/redpanda/redpanda/builds/75708#019a563c-2934-44ee-9734-7d0cad1835e0 FAIL 0/1
ShadowLinkingReplicationTests test_replication_with_failures null integration https://buildkite.com/redpanda/redpanda/builds/75708#019a5659-9820-4270-a89b-ae493d71b826 FLAKY 19/21 upstream reliability is '100.0'. current run reliability is '90.47619047619048'. drift is 9.52381 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_replication_with_failures
OffsetForLeaderEpochTest test_offset_for_leader_epoch null integration https://buildkite.com/redpanda/redpanda/builds/75708#019a5659-9820-4270-a89b-ae493d71b826 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=OffsetForLeaderEpochTest&test_method=test_offset_for_leader_epoch
OffsetForLeaderEpochTest test_offset_for_leader_epoch null integration https://buildkite.com/redpanda/redpanda/builds/75708#019a56bb-7563-4818-b52b-7219f9416965 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=OffsetForLeaderEpochTest&test_method=test_offset_for_leader_epoch
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/75708#019a56bb-7564-4481-80f7-620f9dacad31 FLAKY 14/21 upstream reliability is '97.25190839694656'. current run reliability is '66.66666666666666'. drift is 30.58524 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": true} integration https://buildkite.com/redpanda/redpanda/builds/75708#019a56bb-7565-4bb2-a1b1-090084d2dd09 FLAKY 10/21 upstream reliability is '95.4271961492178'. current run reliability is '47.61904761904761'. drift is 47.80815 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
ShadowLinkingRandomOpsTest test_node_operations {"failures": false} integration https://buildkite.com/redpanda/redpanda/builds/75708#019a56bb-7564-4481-80f7-620f9dacad31 FLAKY 19/21 upstream reliability is '98.94551845342706'. current run reliability is '90.47619047619048'. drift is 8.46933 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingRandomOpsTest&test_method=test_node_operations
test results on build#75830
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingReplicationTests test_topic_delete {"source_cluster_spec": {"cluster_type": "kafka", "kafka_quorum": "COMBINED_KRAFT", "kafka_version": "3.8.0"}} integration https://buildkite.com/redpanda/redpanda/builds/75830#019a5efd-e175-492f-a285-e4e88277b42f FLAKY 20/21 upstream reliability is '98.125'. current run reliability is '95.23809523809523'. drift is 2.8869 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_topic_delete
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": true} integration https://buildkite.com/redpanda/redpanda/builds/75830#019a5f02-0be0-44af-ba71-83e6bc1b9bfb FLAKY 19/21 upstream reliability is '99.25558312655087'. current run reliability is '90.47619047619048'. drift is 8.77939 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
ScalingUpTest test_fast_node_addition null integration https://buildkite.com/redpanda/redpanda/builds/75830#019a5f02-0bde-4749-a583-3e998d09231a FLAKY 20/21 upstream reliability is '97.93510324483776'. current run reliability is '95.23809523809523'. drift is 2.69701 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition
test results on build#76764
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkConsumeGroupsMirroringTest test_continuous_group_sync {"source_cluster_spec": {"cluster_type": "redpanda"}, "with_failures": true} integration https://buildkite.com/redpanda/redpanda/builds/76764#019aa402-28bc-481f-9f13-6d9250ca30a0 FLAKY 20/21 upstream reliability is '99.36842105263159'. current run reliability is '95.23809523809523'. drift is 4.13033 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkConsumeGroupsMirroringTest&test_method=test_continuous_group_sync
TopicRecoveryTest test_many_partitions {"check_mode": "check_manifest_and_segment_metadata", "cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/76764#019aa402-28c1-4fe7-9a17-762dd473d5ce FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecoveryTest&test_method=test_many_partitions
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/76764#019aa3fa-ecdf-42f1-8c8d-0a5e98ea6acf FLAKY 20/21 upstream reliability is '90.927624872579'. current run reliability is '95.23809523809523'. drift is -4.31047 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

@joe-redpanda joe-redpanda marked this pull request as draft November 6, 2025 05:16
@joe-redpanda joe-redpanda force-pushed the deflake_offset_for_leader_epoch branch from 45bed9a to de850ef Compare November 7, 2025 15:11
@joe-redpanda joe-redpanda marked this pull request as ready for review November 7, 2025 15:11
Comment thread src/v/kafka/server/handlers/offset_for_leader_epoch.cc Outdated
@mmaslankaprv

Copy link
Copy Markdown
Member

/ci-repeat 4
skip-redpanda-build
dt-repeat=10
tests/rptest/tests/offset_for_leader_epoch_test.py

Describe paritition can occasionally return -1 aka UNKNOWN_EPOCH for a
partition's term.

This epoch is subsequently fed into offset_for_leader_epoch.

This is an illegal value for offset_for_leader_epoch.

This commit changes the logic for gathering partition data to require
first that all parition descriptions returned have a valid high
watermark and a valid epoch.
@joe-redpanda joe-redpanda force-pushed the deflake_offset_for_leader_epoch branch from de850ef to 54b62b6 Compare November 21, 2025 00:55
@joe-redpanda

Copy link
Copy Markdown
Contributor Author

I researched this again and found the following:

[DEBUG - 2025-10-29 16:51:31,697 - offset_for_leader_epoch_test - test_offset_for_leader_epoch_transfer - lineno:274]: Fetched offsets for epoch -1 : {... 30: 0, ...}, expected: {30: 612}

relevant invocation of offfset_for_leader_epoch:
BROKER TOPIC PARTITION LEADER EPOCH END OFFSET ERROR
1 topic-kggcsntgbk 0 -1 0
1 topic-kggcsntgbk 1 -1 0
1 topic-kggcsntgbk 2 -1 0
1 topic-kggcsntgbk 5 -1 0
1 topic-kggcsntgbk 10 -1 0
1 topic-kggcsntgbk 12 -1 0
1 topic-kggcsntgbk 13 -1 0

where did we get -1 for the epoch?

... /redpanda/redpanda/vbuild/redpanda_installs/ci/bin/rpk', 'topic', '-X', 'brokers=docker-rp-7:9092,docker-rp-22:9092,docker-rp-17:9092,docker-rp-23:9092,docker-rp-18:9092', 'describe', 'topic-kggcsntgbk', '-p', '-X', 'globals.request_timeout_overhead=30s', '-v']
[DEBUG - 2025-10-29 16:50:59,790 - rpk - _execute - lineno:1473]:
PARTITION LEADER EPOCH REPLICAS LOG-START-OFFSET HIGH-WATERMARK
0 2 4 [1 2 5] 0 661
1 4 4 [1 4 5] 0 617
2 3 4 [1 3 4] 0 612
...
30 1 -1 [1 2 4] 0 612
...

RPK returned -1 for describe topic which was fed into offset_for_leader_epoch, which is an invalid value to provide, this results in an expectation mismatch between the offset 612, and the default offset which gets returned for an invalid partition which is the log_start offset

Should we check the provided epoch to guarantee that it is a valid epoch?

@mmaslankaprv

Copy link
Copy Markdown
Member

@joe-redpanda can we wait in the test for valid set of epochs ?

@joe-redpanda

Copy link
Copy Markdown
Contributor Author

@joe-redpanda can we wait in the test for valid set of epochs ?

That's whats happening.

    def _get_offsets_and_epochs(self, rpk: RpkTool, topic_name: str):
        offsets = []

        def refresh():
            result = rpk.describe_topic(topic_name)
            offsets.clear()
            offsets.extend(result)

        def all_offsets_valid():
            refresh()
            # metadata request may return INVALID_EPOCH aka -1
            # this should not be used because INVALID_EPOCH maps to latest available
            # epoch in OffsetForLeaderEpochRequest
            return all([p.high_watermark >= 0 and p.leader_epoch >= 0 for p in offsets])

        wait_until(all_offsets_valid, 30, 1)

        return offsets

we're doing a 30s wait until the condition is met where the condition is

p.high_watermark >= 0 and p.leader_epoch

@joe-redpanda

Copy link
Copy Markdown
Contributor Author

A good question, though, is should we strike down a user request with 'invalid_request' when epoch is < 0

@bharathv

Copy link
Copy Markdown
Contributor

@joe-redpanda can we wait in the test for valid set of epochs ?

That's whats happening.

    def _get_offsets_and_epochs(self, rpk: RpkTool, topic_name: str):
        offsets = []

        def refresh():
            result = rpk.describe_topic(topic_name)
            offsets.clear()
            offsets.extend(result)

        def all_offsets_valid():
            refresh()
            # metadata request may return INVALID_EPOCH aka -1
            # this should not be used because INVALID_EPOCH maps to latest available
            # epoch in OffsetForLeaderEpochRequest
            return all([p.high_watermark >= 0 and p.leader_epoch >= 0 for p in offsets])

        wait_until(all_offsets_valid, 30, 1)

        return offsets

we're doing a 30s wait until the condition is met where the condition is

p.high_watermark >= 0 and p.leader_epoch

I"m curious why epoch is -1 despite so many retries?

@joe-redpanda

Copy link
Copy Markdown
Contributor Author

@joe-redpanda can we wait in the test for valid set of epochs ?

That's whats happening.

    def _get_offsets_and_epochs(self, rpk: RpkTool, topic_name: str):
        offsets = []

        def refresh():
            result = rpk.describe_topic(topic_name)
            offsets.clear()
            offsets.extend(result)

        def all_offsets_valid():
            refresh()
            # metadata request may return INVALID_EPOCH aka -1
            # this should not be used because INVALID_EPOCH maps to latest available
            # epoch in OffsetForLeaderEpochRequest
            return all([p.high_watermark >= 0 and p.leader_epoch >= 0 for p in offsets])

        wait_until(all_offsets_valid, 30, 1)

        return offsets

we're doing a 30s wait until the condition is met where the condition is

p.high_watermark >= 0 and p.leader_epoch

I"m curious why epoch is -1 despite so many retries?

previously the criteria was only p.high_watermark >= 0, with no epoch check

heres the old 'all' command

          def all_offsets_valid():
                refresh()
                return all([p.high_watermark >= 0 for p in offsets])

I bumped this to a local function because both tests are susceptible to this.

@joe-redpanda joe-redpanda merged commit 623951f into redpanda-data:dev Nov 21, 2025
23 checks passed
@vbotbuildovich

Copy link
Copy Markdown
Collaborator

/backport v25.3.x

@vbotbuildovich

Copy link
Copy Markdown
Collaborator

/backport v25.2.x

@vbotbuildovich

Copy link
Copy Markdown
Collaborator

/backport v25.1.x

@vbotbuildovich

Copy link
Copy Markdown
Collaborator

/backport v24.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants