Skip to content

[Feature Request] Implement mechanism to fail stale search replicas #17032

@vinaykpud

Description

@vinaykpud

Is your feature request related to a problem? Please describe

Currently, there is no mechanism in place to automatically fail search replicas that are significantly lagging behind the primary shard and have become stale. This can lead to inconsistent search results by returning stale data.

Describe the solution you'd like

With Issue #16801, we are proposing to redefine the computation of lag. The lag will now be defined as the difference between the current time and the timestamp of the latest received checkpoint (cp). This change means we will no longer compare the lag with the primary shard directly. Instead, we need a mechanism to monitor the lag in search replica shards. If a search replica exceeds a predefined lag threshold, it should be marked as stale and automatically fail.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

IndexingIndexing, Bulk Indexing and anything related to indexingIndexing:ReplicationIssues and PRs related to core replication framework eg segrepSearch:PerformanceenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions