Deflake offset for leader epoch by joe-redpanda · Pull Request #28389 · redpanda-data/redpanda

joe-redpanda · 2025-11-05T22:55:32Z

Describe paritition can occasionally return -1 aka UNKNOWN_EPOCH for a
partition's term.

This epoch is subsequently fed into offset_for_leader_epoch.

This is an illegal value for offset_for_leader_epoch.

This commit changes the logic for gathering partition data to require
first that all parition descriptions returned have a valid high
watermark and a valid epoch.

Backports Required

Release Notes

Bug Fixes

deflake offset for leader epoch

Copilot

Pull Request Overview

This PR addresses a flaky test issue in the offset_for_leader_epoch handler by improving leadership validation and introducing concurrent processing. The main change replaces the cached is_leader() check with a linearizable_barrier() to ensure accurate, real-time leadership status at the moment the request is received, preventing stale data from causing test failures.

Key changes:

Replaced cached leadership check with linearizable_barrier() for accurate real-time validation
Introduced concurrent processing using ss::max_concurrent_for_each with parallelism limit of 32
Restructured the request processing loop into a lambda-based concurrent execution model

vbotbuildovich · 2025-11-06T01:03:46Z

Retry command for Build#75708

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/offset_for_leader_epoch_test.py::OffsetForLeaderEpochTest.test_offset_for_leader_epoch

vbotbuildovich · 2025-11-06T05:09:32Z

CI test results

test results on build#75708

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
ReplicatedMetastoreTest	TestBasicRemoveTopics		unit	https://buildkite.com/redpanda/redpanda/builds/75708#019a563c-2934-44ee-9734-7d0cad1835e0	FAIL	0/1
ShadowLinkingReplicationTests	test_replication_with_failures	null	integration	https://buildkite.com/redpanda/redpanda/builds/75708#019a5659-9820-4270-a89b-ae493d71b826	FLAKY	19/21	upstream reliability is '100.0'. current run reliability is '90.47619047619048'. drift is 9.52381 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_replication_with_failures
OffsetForLeaderEpochTest	test_offset_for_leader_epoch	null	integration	https://buildkite.com/redpanda/redpanda/builds/75708#019a5659-9820-4270-a89b-ae493d71b826	FAIL	0/21	The test has failed across all retries	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=OffsetForLeaderEpochTest&test_method=test_offset_for_leader_epoch
OffsetForLeaderEpochTest	test_offset_for_leader_epoch	null	integration	https://buildkite.com/redpanda/redpanda/builds/75708#019a56bb-7563-4818-b52b-7219f9416965	FAIL	0/21	The test has failed across all retries	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=OffsetForLeaderEpochTest&test_method=test_offset_for_leader_epoch
RedpandaNodeOperationsSmokeTest	test_node_ops_smoke_test	{"cloud_storage_type": 1, "mixed_versions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/75708#019a56bb-7564-4481-80f7-620f9dacad31	FLAKY	14/21	upstream reliability is '97.25190839694656'. current run reliability is '66.66666666666666'. drift is 30.58524 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
RedpandaNodeOperationsSmokeTest	test_node_ops_smoke_test	{"cloud_storage_type": 1, "mixed_versions": true}	integration	https://buildkite.com/redpanda/redpanda/builds/75708#019a56bb-7565-4bb2-a1b1-090084d2dd09	FLAKY	10/21	upstream reliability is '95.4271961492178'. current run reliability is '47.61904761904761'. drift is 47.80815 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
ShadowLinkingRandomOpsTest	test_node_operations	{"failures": false}	integration	https://buildkite.com/redpanda/redpanda/builds/75708#019a56bb-7564-4481-80f7-620f9dacad31	FLAKY	19/21	upstream reliability is '98.94551845342706'. current run reliability is '90.47619047619048'. drift is 8.46933 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingRandomOpsTest&test_method=test_node_operations

test results on build#75830

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
ShadowLinkingReplicationTests	test_topic_delete	{"source_cluster_spec": {"cluster_type": "kafka", "kafka_quorum": "COMBINED_KRAFT", "kafka_version": "3.8.0"}}	integration	https://buildkite.com/redpanda/redpanda/builds/75830#019a5efd-e175-492f-a285-e4e88277b42f	FLAKY	20/21	upstream reliability is '98.125'. current run reliability is '95.23809523809523'. drift is 2.8869 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_topic_delete
RedpandaNodeOperationsSmokeTest	test_node_ops_smoke_test	{"cloud_storage_type": 1, "mixed_versions": true}	integration	https://buildkite.com/redpanda/redpanda/builds/75830#019a5f02-0be0-44af-ba71-83e6bc1b9bfb	FLAKY	19/21	upstream reliability is '99.25558312655087'. current run reliability is '90.47619047619048'. drift is 8.77939 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
ScalingUpTest	test_fast_node_addition	null	integration	https://buildkite.com/redpanda/redpanda/builds/75830#019a5f02-0bde-4749-a583-3e998d09231a	FLAKY	20/21	upstream reliability is '97.93510324483776'. current run reliability is '95.23809523809523'. drift is 2.69701 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition

test results on build#76764

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
ShadowLinkConsumeGroupsMirroringTest	test_continuous_group_sync	{"source_cluster_spec": {"cluster_type": "redpanda"}, "with_failures": true}	integration	https://buildkite.com/redpanda/redpanda/builds/76764#019aa402-28bc-481f-9f13-6d9250ca30a0	FLAKY	20/21	upstream reliability is '99.36842105263159'. current run reliability is '95.23809523809523'. drift is 4.13033 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkConsumeGroupsMirroringTest&test_method=test_continuous_group_sync
TopicRecoveryTest	test_many_partitions	{"check_mode": "check_manifest_and_segment_metadata", "cloud_storage_type": 1}	integration	https://buildkite.com/redpanda/redpanda/builds/76764#019aa402-28c1-4fe7-9a17-762dd473d5ce	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecoveryTest&test_method=test_many_partitions
WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/76764#019aa3fa-ecdf-42f1-8c8d-0a5e98ea6acf	FLAKY	20/21	upstream reliability is '90.927624872579'. current run reliability is '95.23809523809523'. drift is -4.31047 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

mmaslankaprv · 2025-11-17T07:33:21Z

/ci-repeat 4
skip-redpanda-build
dt-repeat=10
tests/rptest/tests/offset_for_leader_epoch_test.py

Describe paritition can occasionally return -1 aka UNKNOWN_EPOCH for a partition's term. This epoch is subsequently fed into offset_for_leader_epoch. This is an illegal value for offset_for_leader_epoch. This commit changes the logic for gathering partition data to require first that all parition descriptions returned have a valid high watermark and a valid epoch.

joe-redpanda · 2025-11-21T01:02:35Z

I researched this again and found the following:

[DEBUG - 2025-10-29 16:51:31,697 - offset_for_leader_epoch_test - test_offset_for_leader_epoch_transfer - lineno:274]: Fetched offsets for epoch -1 : {... 30: 0, ...}, expected: {30: 612}

relevant invocation of offfset_for_leader_epoch:
BROKER TOPIC PARTITION LEADER EPOCH END OFFSET ERROR
1 topic-kggcsntgbk 0 -1 0
1 topic-kggcsntgbk 1 -1 0
1 topic-kggcsntgbk 2 -1 0
1 topic-kggcsntgbk 5 -1 0
1 topic-kggcsntgbk 10 -1 0
1 topic-kggcsntgbk 12 -1 0
1 topic-kggcsntgbk 13 -1 0

where did we get -1 for the epoch?

... /redpanda/redpanda/vbuild/redpanda_installs/ci/bin/rpk', 'topic', '-X', 'brokers=docker-rp-7:9092,docker-rp-22:9092,docker-rp-17:9092,docker-rp-23:9092,docker-rp-18:9092', 'describe', 'topic-kggcsntgbk', '-p', '-X', 'globals.request_timeout_overhead=30s', '-v']
[DEBUG - 2025-10-29 16:50:59,790 - rpk - _execute - lineno:1473]:
PARTITION LEADER EPOCH REPLICAS LOG-START-OFFSET HIGH-WATERMARK
0 2 4 [1 2 5] 0 661
1 4 4 [1 4 5] 0 617
2 3 4 [1 3 4] 0 612
...
30 1 -1 [1 2 4] 0 612
...

RPK returned -1 for describe topic which was fed into offset_for_leader_epoch, which is an invalid value to provide, this results in an expectation mismatch between the offset 612, and the default offset which gets returned for an invalid partition which is the log_start offset

Should we check the provided epoch to guarantee that it is a valid epoch?

mmaslankaprv · 2025-11-21T07:48:10Z

@joe-redpanda can we wait in the test for valid set of epochs ?

joe-redpanda · 2025-11-21T16:30:34Z

@joe-redpanda can we wait in the test for valid set of epochs ?

That's whats happening.

    def _get_offsets_and_epochs(self, rpk: RpkTool, topic_name: str):
        offsets = []

        def refresh():
            result = rpk.describe_topic(topic_name)
            offsets.clear()
            offsets.extend(result)

        def all_offsets_valid():
            refresh()
            # metadata request may return INVALID_EPOCH aka -1
            # this should not be used because INVALID_EPOCH maps to latest available
            # epoch in OffsetForLeaderEpochRequest
            return all([p.high_watermark >= 0 and p.leader_epoch >= 0 for p in offsets])

        wait_until(all_offsets_valid, 30, 1)

        return offsets

we're doing a 30s wait until the condition is met where the condition is

p.high_watermark >= 0 and p.leader_epoch

joe-redpanda · 2025-11-21T16:31:44Z

A good question, though, is should we strike down a user request with 'invalid_request' when epoch is < 0

bharathv · 2025-11-21T17:31:57Z

@joe-redpanda can we wait in the test for valid set of epochs ?

That's whats happening.

    def _get_offsets_and_epochs(self, rpk: RpkTool, topic_name: str):
        offsets = []

        def refresh():
            result = rpk.describe_topic(topic_name)
            offsets.clear()
            offsets.extend(result)

        def all_offsets_valid():
            refresh()
            # metadata request may return INVALID_EPOCH aka -1
            # this should not be used because INVALID_EPOCH maps to latest available
            # epoch in OffsetForLeaderEpochRequest
            return all([p.high_watermark >= 0 and p.leader_epoch >= 0 for p in offsets])

        wait_until(all_offsets_valid, 30, 1)

        return offsets

we're doing a 30s wait until the condition is met where the condition is

p.high_watermark >= 0 and p.leader_epoch

I"m curious why epoch is -1 despite so many retries?

joe-redpanda · 2025-11-21T18:16:57Z

@joe-redpanda can we wait in the test for valid set of epochs ?

That's whats happening.

    def _get_offsets_and_epochs(self, rpk: RpkTool, topic_name: str):
        offsets = []

        def refresh():
            result = rpk.describe_topic(topic_name)
            offsets.clear()
            offsets.extend(result)

        def all_offsets_valid():
            refresh()
            # metadata request may return INVALID_EPOCH aka -1
            # this should not be used because INVALID_EPOCH maps to latest available
            # epoch in OffsetForLeaderEpochRequest
            return all([p.high_watermark >= 0 and p.leader_epoch >= 0 for p in offsets])

        wait_until(all_offsets_valid, 30, 1)

        return offsets

we're doing a 30s wait until the condition is met where the condition is

p.high_watermark >= 0 and p.leader_epoch

I"m curious why epoch is -1 despite so many retries?

previously the criteria was only p.high_watermark >= 0, with no epoch check

heres the old 'all' command

          def all_offsets_valid():
                refresh()
                return all([p.high_watermark >= 0 for p in offsets])

I bumped this to a local function because both tests are susceptible to this.

vbotbuildovich · 2025-11-21T22:06:11Z

/backport v25.3.x

vbotbuildovich · 2025-11-21T22:06:12Z

/backport v25.2.x

vbotbuildovich · 2025-11-21T22:06:13Z

/backport v25.1.x

vbotbuildovich · 2025-11-21T22:06:13Z

/backport v24.3.x

Copilot AI review requested due to automatic review settings November 5, 2025 22:55

github-actions Bot added the area/redpanda label Nov 5, 2025

joe-redpanda requested review from bashtanov, bharathv and mmaslankaprv November 5, 2025 22:55

Copilot AI reviewed Nov 5, 2025

View reviewed changes

Comment thread src/v/kafka/server/handlers/offset_for_leader_epoch.cc Outdated

Comment thread src/v/kafka/server/handlers/offset_for_leader_epoch.cc Outdated

joe-redpanda marked this pull request as draft November 6, 2025 05:16

joe-redpanda removed request for bashtanov, bharathv and mmaslankaprv November 6, 2025 05:17

joe-redpanda force-pushed the deflake_offset_for_leader_epoch branch from 45bed9a to de850ef Compare November 7, 2025 15:11

joe-redpanda marked this pull request as ready for review November 7, 2025 15:11

joe-redpanda requested review from bashtanov, bharathv and mmaslankaprv November 10, 2025 14:59

mmaslankaprv reviewed Nov 12, 2025

View reviewed changes

Comment thread src/v/kafka/server/handlers/offset_for_leader_epoch.cc Outdated

joe-redpanda requested a review from mmaslankaprv November 14, 2025 16:47

joe-redpanda force-pushed the deflake_offset_for_leader_epoch branch from de850ef to 54b62b6 Compare November 21, 2025 00:55

bharathv approved these changes Nov 21, 2025

View reviewed changes

joe-redpanda merged commit 623951f into redpanda-data:dev Nov 21, 2025
23 checks passed

This was referenced Nov 21, 2025

[v24.3.x] Deflake offset for leader epoch #28702

Merged

[v25.3.x] Deflake offset for leader epoch #28703

Merged

[v25.1.x] Deflake offset for leader epoch #28704

Closed

[v25.2.x] Deflake offset for leader epoch #28705

Merged

Conversation

joe-redpanda commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Bug Fixes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

vbotbuildovich commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Retry command for Build#75708

Uh oh!

vbotbuildovich commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

Uh oh!

mmaslankaprv commented Nov 17, 2025

Uh oh!

joe-redpanda commented Nov 21, 2025

Uh oh!

mmaslankaprv commented Nov 21, 2025

Uh oh!

joe-redpanda commented Nov 21, 2025

Uh oh!

joe-redpanda commented Nov 21, 2025

Uh oh!

bharathv commented Nov 21, 2025

Uh oh!

joe-redpanda commented Nov 21, 2025

Uh oh!

Uh oh!

vbotbuildovich commented Nov 21, 2025

Uh oh!

vbotbuildovich commented Nov 21, 2025

Uh oh!

vbotbuildovich commented Nov 21, 2025

Uh oh!

vbotbuildovich commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

joe-redpanda commented Nov 5, 2025 •

edited

Loading

vbotbuildovich commented Nov 6, 2025 •

edited

Loading

vbotbuildovich commented Nov 6, 2025 •

edited

Loading